Page 309 - 《软件学报》2025年第4期
P. 309

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
                 2025,36(4):1715−1757 [doi: 10.13328/j.cnki.jos.007250] [CSTR: 32375.14.jos.007250]  http://www.jos.org.cn
                 ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563



                                                                 *
                 面向具身人工智能的物体目标导航综述

                 陈铂垒,    康嘉绪,    钟    萍,    崔永正,    卢思怡,    杨昊楠,    王建新


                 (中南大学 计算机学院, 湖南 长沙 410083)
                 通信作者: 钟萍, E-mail: ping.zhong@csu.edu.cn

                 摘 要: 近年来随着计算机视觉和人工智能领域的不断发展, 具身人工智能                       (embodied AI) 受到国内外学术界和工
                 业界的广泛关注. 具身人工智能强调具身智能体通过与环境进行情景化的交互来主动获取物理世界的真实反馈,
                 并通过对反馈进行学习使具身智能体更加智能. 作为具身人工智能具体化的任务之一, 物体目标导航要求具身智
                 能体在事先未知的、复杂且语义丰富的场景中搜寻并导航至指定的物体目标                            (例如: 找到水槽). 物体目标导航在
                 辅助人类日常活动的智能助手方面有着巨大的应用潜力, 是其他基于交互的具身智能研究的基础和前置任务. 系

                 统地分类和梳理当前物体目标导航相关工作, 首先介绍环境表示和视觉自主探索相关知识, 从                               3  种不同的角度对
                 现有的物体目标导航方法进行分类和分析, 其次介绍两类更高层次的物体重排布任务, 描述逼真的室内仿真环境
                 数据集、评价指标和通用的导航策略训练范式, 最后比较和分析现有的物体目标导航策略在不同数据集上的性能,
                 总结该领域所面临的挑战, 并对发展前景作出展望.
                 关键词: 物体目标导航; 具身人工智能; 视觉自主探索; 视觉物体重排布
                 中图法分类号: TP18

                 中文引用格式: 陈铂垒, 康嘉绪, 钟萍, 崔永正, 卢思怡, 杨昊楠, 王建新. 面向具身人工智能的物体目标导航综述. 软件学报, 2025,
                 36(4): 1715–1757. http://www.jos.org.cn/1000-9825/7250.htm
                 英文引用格式: Chen BL, Kang JX, Zhong P, Cui YZ, Lu SY, Yang HN, Wang JX. Survey on Object Goal Navigation for Embodied
                 AI. Ruan Jian Xue Bao/Journal of Software, 2025, 36(4): 1715–1757 (in Chinese). http://www.jos.org.cn/1000-9825/7250.htm

                 Survey on Object Goal Navigation for Embodied AI
                 CHEN Bo-Lei, KANG Jia-Xu, ZHONG Ping, CUI Yong-Zheng, LU Si-Yi, YANG Hao-Nan, WANG Jian-Xin
                 (School of Computer Science and Engineering, Central South University, Changsha 410083, China)
                 Abstract:  With  the  continuous  development  of  computer  vision  and  artificial  intelligence  (AI)  in  recent  years,  embodied  AI  has  received
                 widespread  attention  from  academia  and  industry  at  home  and  abroad.  Embodied  AI  emphasizes  that  an  agent  should  actively  obtain  real
                 feedback  from  the  physical  world  by  interacting  with  the  environment  in  a  contextualized  way  and  make  itself  more  intelligent  through
                 learning  from  the  feedback.  As  one  of  the  concrete  tasks  of  embodied  AI,  object  goal  navigation  requires  an  agent  to  search  for  and
                 navigate  to  a  specified  object  goal  (e.g.,  find  a  sink)  in  a  previously  unknown,  complex,  and  semantically  rich  scenario.  Object  goal
                 navigation  has  great  potential  for  applications  in  smart  assistants  that  support  daily  human  activities,  serving  as  a  fundamental  and
                 antecedent  task  for  other  interaction-based  embodied  AI  research.  This  study  systematically  classifies  current  research  on  object  goal
                 navigation.  Firstly,  the  knowledge  related  to  environmental  representation  and  autonomous  visual  exploration  is  introduced,  and  existing
                 object  goal  navigation  methods  are  classified  and  analyzed  from  three  different  perspectives.  Secondly,  two  categories  of  higher-level
                 object rearrangement tasks are introduced, with a description of datasets for realistic indoor environment simulation, evaluation metrics, and
                 a generic training paradigm for navigation strategies. Finally, the performance of existing object goal navigation strategies is compared and
                 analyzed on different datasets. The challenges in this field are summarized, and development trends are predicted.
                 Key words:  object goal navigation; embodied AI; autonomous visual exploration; visual object rearrangement


                 *    基金项目: 国家自然科学基金  (62172443); 湖南省自然科学基金  (2022JJ30760); 长沙市自然科学基金  (kq2202107, kq2202108)
                  收稿时间: 2023-05-29; 修改时间: 2023-10-08, 2024-05-14; 采用时间: 2024-07-15; jos 在线出版时间: 2024-11-27
                  CNKI 网络首发时间: 2024-11-28
   304   305   306   307   308   309   310   311   312   313   314