Skip to main content Skip to main navigation

Publikation

Embodied Arena: A Comprehensive, Unified, and Evolving Evaluation Platform for Embodied AI

Dominik Pfeiffer; Min Zhang; Pengyi Li; Yifu Yuan; Lingfeng Zhang; Yuecheng Liu; Peilong Han; Longxin Kou; Shaojin Ma; Jinbin Qiao; David Gamaliel Arcos Bravo; Yuening Wang; Xiao Hu; Zhanguang Zhang; Xianze Yao; Yutong Li; Han Zhao Zhang; Ying Wen; Ying-Cong Chen; Xiaodan Liang; Liang Lin; Robin Scheid; Haitham Bou-Ammar; He Wang; Huazhe Xu; Jiankang Deng; Shan Luo; Shuqiang Jiang; Wei Pan; Yang Gao; Stefanos Zafeiriou; Jan Peters; Yuzheng Zhuang; Yingxue Zhang; Yan Zheng; Hongyao Tang; Jianye Hao
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2509.15273, Pages 1-32, arXiv, 2025.

Zusammenfassung

Embodied AI has shown great promise in empowering AI models to perceive, interact with, and ultimately change the physical world. Parallel to the development of large foundation models, Embodied AI is largely falling behind. Located at the center of Embodied AI, three essential challenges emerge and become even more stringent: (1) systematic understanding of the core capabilities needed for Embodied AI is missing in the community, making research lack of clear objectives; (2) despite the proposal of various benchmarks for Embodied AI, there is no unified and standardized evaluation system, leaving the cross-benchmark evaluation and comparison infeasible; (3) different from large language models (LLMs) powered by numerous web-scale data, automated and scalable acquisition methods for embodied data have not been well developed, which poses a critical bottleneck on the scaling of evaluation and training of Embodied AI models. To break the three obstacles, this paper presents Embodied Arena, a comprehensive, unified, and evolving evaluation platform and leaderboards for Embodied AI. First, Embodied Arena is established upon a systematic embodied capability taxonomy spanning three levels (i.e., perception, reasoning, task execution), seven core embodied capabilities, and 25 fine-grained dimensions. This taxonomy is proposed by absorbing and refining the partial categories in prior works, which allows for unified evaluation and offers systematic objectives for frontier research. Second, Embodied Arena closes the critical gap in standardized evaluation by introducing a unified embodied evaluation system. The system is built upon a unified evaluation infrastructure supporting flexible integration of advanced benchmarks and models, which has covered 22 diverse benchmarks across three domains (2D/3D Embodied Q&A, Navigation, and Task Planning) and 30+ advanced models from 20+ worldwide institutes. Third, Embodied Arena is powered by a novel LLM-driven automated generation pipeline that ensures the scalability of embodied evaluation data and allows it to keep evolving for diversity and comprehensiveness. Building upon the three major components, Embodied Arena addresses the three essential challenges correspondingly. Moreover, Embodied Arena provides professional support for more advanced models and embodied benchmarks to join, along with frequent maintenance and updates. Through comprehensive evaluation of the growing model population based on evolving evaluation data, Embodied Arena publishes three types of leaderboards (i.e., Embodied Q&A, Embodied Navigation, Embodied Task Planning) with two orthogonal views (i.e., the benchmark view and the capability view), offering a real-time overview of the embodied capabilities of advanced models. Especially, we present nine findings summarized from the evaluation results on the leaderboards of Embodied Arena. This helps to establish clear research veins and pinpoint critical research problems, thereby driving forward progress in the field of Embodied AI.

Weitere Links