Page 349 - 《软件学报》2025年第4期
P. 349

陈铂垒 等: 面向具身人工智能的物体目标导航综述                                                        1755


                      Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 14850–14859. [doi: 10.
                      1109/CVPR52688.2022.01445]
                 [111]  Luo HK, Yue A, Hong ZW, Agrawal P. Stubborn: A strong baseline for indoor object navigation. In: Proc. of the 2022 IEEE/RSJ Int’l
                      Conf. on Intelligent Robots and Systems. Kyoto: IEEE, 2022. 3287–3293. [doi: 10.1109/IROS47612.2022.9981646]
                 [112]  Dang RH, Wang LY, He ZT, Su S, Tang JG, Liu CJ, Chen QJ. Search for or navigate to? Dual adaptive thinking for object navigation.
                      In: Proc. of the 2023 IEEE/CVF Int’l Conf. on Computer Vision. Paris: IEEE, 2023. 8216–8225. [doi: 10.1109/ICCV51070.2023.00758]
                 [113]  Zhou KW, Zheng KZ, Pryor C, et al. ESC: Exploration with soft commonsense constraints for zero-shot object navigation. In: Proc. of
                      the 40th Int’l Conf. on Machine Learning. Honolulu: PMLR, 2023. 42829–42842.
                 [114]  Liu  JJ,  Guo  JF,  Meng  ZH,  Xue  JT.  ReVoLT:  Relational  reasoning  and  Voronoi  local  graph  planning  for  target-driven  navigation.
                      arXiv:2301.02382, 2023.
                 [115]  Chen  BL,  Kang  JX,  Zhong  P,  Cui  YZ,  Lu  SY,  Liang  YX,  Wang  JX.  Think  holistically,  act  down-to-earth:  A  semantic  navigation
                      strategy with continuous environmental representation and multi-step forward planning. IEEE Trans. on Circuits and Systems for Video
                      Technology, 2024, 34(5): 3860–3875. [doi: 10.1109/TCSVT.2023.3324380]
                 [116]  Wang S, Wu ZH, Hu XB, Lin YS, Lv K. Skill-based hierarchical reinforcement learning for target visual navigation. IEEE Trans. on
                      Multimedia, 2023, 25: 8920–8932. [doi: 10.1109/TMM.2023.3243618]
                 [117]  Krishna R, Zhu YK, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li LJ, Shamma DA, Bernstein MS, Fei-Fei L. Visual

                      genome: Connecting language and vision using crowdsourced dense image annotations. Int’l Journal of Computer Vision, 2017, 123(1):
                      32–73. [doi: 10.1007/s11263-016-0981-7]
                 [118]  Gupta S, Davidson J, Levine S, Sukthankar R, Malik J. Cognitive mapping and planning for visual navigation. In: Proc. of the 2017
                      IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 7272–7281. [doi: 10.1109/CVPR.2017.769]
                 [119]  Ammirato P, Poirson P, Park E, Košecká J, Berg AC. A dataset for developing and benchmarking active vision. In: Proc. of the 2017
                      IEEE Int’l Conf. on Robotics and Automation. Singapore: IEEE, 2017. 1378–1385. [doi: 10.1109/ICRA.2017.7989164]
                 [120]  Radford  A,  Kim  JW,  Hallacy  C,  Ramesh  A,  Goh  G,  Agarwal  S,  Sastry  G,  Askell  A,  Mishkin  P,  Clark  J,  Krueger  G,  Sutskever  I.
                      Learning transferable visual models from natural language supervision. In: Proc. of the 38th Int’l Conf. on Machine Learning. PMLR,
                      2021. 8748–8763.
                 [121]  Khandelwal A, Weihs L, Mottaghi R, Kembhavi A. Simple but effective: CLIP embeddings for embodied AI. In: Proc. of the 2022
                      IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 14809–14818. [doi: 10.1109/CVPR52688.
                      2022.01441]
                 [122]  Trabucco B, Sigurdsson GA, Piramuthu R, Sukhatme GS, Salakhutdinov R. A simple approach for visual room rearrangement: 3D
                      mapping and semantic search. In: Proc. of the 11th Int’l Conf. on Learning Representations. Kigali: ICLR, 2023.
                 [123]  Sarch G, Fang ZY, Harley AW, Schydlo P, Tarr MJ, Gupta S, Fragkiadaki K. TIDEE: Tidying up novel rooms using visuo-semantic
                      commonsense priors. In: Proc. of the 17th European Conf. on Computer Vision. Tel Aviv: Springer, 2022. 480–496. [doi: 10.1007/978-3-
                      031-19842-7_28]
                 [124]  Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, Agrawal H. HouseKeep: Tidying virtual households using
                      commonsense  reasoning.  In:  Proc.  of  the  17th  European  Conf.  on  Computer  Vision.  Tel  Aviv:  Springer,  2022.  355–373.  [doi: 10.
                      1007/978-3-031-19842-7_21]
                 [125]  Yamauchi  B.  A  frontier-based  approach  for  autonomous  exploration.  In:  Proc.  of  the  1997  IEEE  Int’l  Symp.  on  Computational
                      Intelligence in Robotics and Automation: Towards New Computational Principles for Robotics and Automation. Monterey: IEEE, 1997.
                      146–151. [doi: 10.1109/CIRA.1997.613851]
                 [126]  Ramakrishnan  SK,  Al-Halah  Z,  Grauman  K.  Occupancy  anticipation  for  efficient  exploration  and  navigation.  In:  Proc.  of  the  16th
                      European Conf. on Computer Vision. Glasgow: Springer, 2020. 400–418. [doi: 10.1007/978-3-030-58558-7_24]
                 [127]  Liu  S,  Okatani  T.  Symmetry-aware  neural  architecture  for  embodied  visual  exploration.  In:  Proc.  of  the  2022  IEEE/CVF  Conf.  on
                      Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 17221–17230. [doi: 10.1109/CVPR52688.2022.01673]
                 [128]  Georgakis G, Bucher B, Arapin A, Schmeckpeper K, Matni N, Daniilidis K. Uncertainty-driven planner for exploration and navigation.
                      In: Proc. of the 2022 Int’l Conf. on Robotics and Automation. Philadelphia: IEEE, 2022. 11295–11302. [doi: 10.1109/ICRA46639.2022.
                      9812423]
                 [129]  Jiang  JD,  Zheng  LA,  Luo  F,  Zhang  ZJ.  RedNet:  Residual  encoder-decoder  network  for  indoor  RGB-D  semantic  segmentation.
                      arXiv:1806.01054, 2018.
                 [130]  He KM, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE,
                      2017. 2980–2988. [doi: 10.1109/ICCV.2017.322]
                 [131]  Hu  XX,  Yang  KL,  Fei  L,  Wang  KW.  ACNet:  Attention  based  network  to  exploit  complementary  features  for  RGBD  semantic
   344   345   346   347   348   349   350   351   352   353   354