An Anatomy of Vision-Language-Action Models:
From Modules to Milestones and Challenges

  • Chao Xu(a,b)
  • Suyu Zhang(a,b)
  • Yang Liu(a,b,c)
  • Baigui Sun(a,b)
  • Weihong Chen(a,b)
  • Bo Xu(a,b)
  • Qi Liu(a,b)
  • Juncheng Wang(d)
  • Shujun Wang(d)
  • Shan Luo(c)
  • Jan Peters(e)
  • Athanasios V. Vasilakos(f)
  • Stefanos Zafeiriou(g)
  • Jiankang Deng(g)
  • (a) IROOTECH TECHNOLOGY
    (b) Wolf 1069 b Lab, Sany Group
    (c) Department of Engineering, King's College London
    (d) The Hong Kong Polytechnic University
    (e) Computer Science Department, Technische Universität Darmstadt
    (f) Department of ICT and Center for AI Research, University of Agder (UiA)
    (g) Department of Computing, Imperial College London

Latest Papers (Weekly Update)

Recently-added or newly-published VLA papers (Updated on: —)
⭐Starred papers are submitted by authors who contacted us. We warmly welcome submissions and encourage researchers to share their latest results.

Overview

Overview of VLA challenges and structure
Taxonomy of VLA challenges.

Abstract

Vision-Language-Action (VLA) models are driving a revolution in robotics, enabling machines to understand instructions and interact with the physical world. This field is exploding with new models and datasets, making it both exciting and challenging to keep pace with. This survey offers a clear and structured guide to the VLA landscape. We design it to follow the natural learning path of a researcher: we start with the basic Modules of any VLA model, trace the history through key Milestones, and then dive deep into the core Challenges that define recent research frontier. Our main contribution is a detailed breakdown of the five biggest challenges about: (1) Representation, (2) Execution, (3) Generalization, (4) Safety, and (5) Dataset and Evaluation. This structure mirrors the developmental roadmap of a generalist agent: establishing the fundamental perception-action loop, scaling capabilities across diverse embodiments and environments, and finally ensuring trustworthy deployment—all supported by the essential data infrastructure. For each of them, we review existing approaches and highlight future opportunities. We position this paper as both a foundational guide for newcomers and a strategic roadmap for experienced researchers, with the dual aim of accelerating learning and inspiring new ideas in embodied intelligence. A live version of this survey, with continuous updates, is maintained in this website.

Timeline

Timeline of VLA models, datasets, and evaluation benchmarks from 2022 to 2025
VLA models, datasets, and evaluation benchmarks from 2022 to 2025.

Interactive Survey Table

Explore the curated list of VLA works. Use the filters below to focus on specific challenges, solution strategies, or evaluation settings. Click on sortable headers to re-order the rows.
⭐ Papers marked with a star are outstanding works contributed by the community. We warmly welcome submissions and encourage researchers to share their latest results.

Abbreviation Date Link Challenge Tag Sub-Challeng Tag How to Solve Training Type Dataset Evaluation
Loading CSV data...
Showing 0-0 of 0 records
Page 1 of 1

BibTeX

                        @article{xu2025anatomyVLA,
                            title   = {An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges},
                            author  = {Xu, Chao and Zhang, Suyu and Liu, Yang and Sun, Baigui and Chen, Weihong and Xu, Bo and Liu, Qi and
                                        Wang, Juncheng and Wang, Shujun and Luo, Shan and Peters, Jan and
                                        Vasilakos, Athanasios V. and Zafeiriou, Stefanos and Deng, Jiankang},
                            journal = {arXiv preprint arXiv:2512.11362},
                            year    = {2025},
                        }
                    

Contact

If you have any questions or suggestions, please feel free to contact us via email at Email Address.



Note

This website is adapted from the Vision-Language-Action Models for Robotics:A Review Towards Real-World Applications Website.

Visitors