Page 349 - 《软件学报》2025年第4期

P. 349

陈铂垒等: 面向具身人工智能的物体目标导航综述 1755

Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 14850–14859. [doi: 10.
1109/CVPR52688.2022.01445]
[111] Luo HK, Yue A, Hong ZW, Agrawal P. Stubborn: A strong baseline for indoor object navigation. In: Proc. of the 2022 IEEE/RSJ Int’l
Conf. on Intelligent Robots and Systems. Kyoto: IEEE, 2022. 3287–3293. [doi: 10.1109/IROS47612.2022.9981646]
[112] Dang RH, Wang LY, He ZT, Su S, Tang JG, Liu CJ, Chen QJ. Search for or navigate to? Dual adaptive thinking for object navigation.
In: Proc. of the 2023 IEEE/CVF Int’l Conf. on Computer Vision. Paris: IEEE, 2023. 8216–8225. [doi: 10.1109/ICCV51070.2023.00758]
[113] Zhou KW, Zheng KZ, Pryor C, et al. ESC: Exploration with soft commonsense constraints for zero-shot object navigation. In: Proc. of
the 40th Int’l Conf. on Machine Learning. Honolulu: PMLR, 2023. 42829–42842.
[114] Liu JJ, Guo JF, Meng ZH, Xue JT. ReVoLT: Relational reasoning and Voronoi local graph planning for target-driven navigation.
arXiv:2301.02382, 2023.
[115] Chen BL, Kang JX, Zhong P, Cui YZ, Lu SY, Liang YX, Wang JX. Think holistically, act down-to-earth: A semantic navigation
strategy with continuous environmental representation and multi-step forward planning. IEEE Trans. on Circuits and Systems for Video
Technology, 2024, 34(5): 3860–3875. [doi: 10.1109/TCSVT.2023.3324380]
[116] Wang S, Wu ZH, Hu XB, Lin YS, Lv K. Skill-based hierarchical reinforcement learning for target visual navigation. IEEE Trans. on
Multimedia, 2023, 25: 8920–8932. [doi: 10.1109/TMM.2023.3243618]
[117] Krishna R, Zhu YK, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li LJ, Shamma DA, Bernstein MS, Fei-Fei L. Visual

genome: Connecting language and vision using crowdsourced dense image annotations. Int’l Journal of Computer Vision, 2017, 123(1):
32–73. [doi: 10.1007/s11263-016-0981-7]
[118] Gupta S, Davidson J, Levine S, Sukthankar R, Malik J. Cognitive mapping and planning for visual navigation. In: Proc. of the 2017
IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 7272–7281. [doi: 10.1109/CVPR.2017.769]
[119] Ammirato P, Poirson P, Park E, Košecká J, Berg AC. A dataset for developing and benchmarking active vision. In: Proc. of the 2017
IEEE Int’l Conf. on Robotics and Automation. Singapore: IEEE, 2017. 1378–1385. [doi: 10.1109/ICRA.2017.7989164]
[120] Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I.
Learning transferable visual models from natural language supervision. In: Proc. of the 38th Int’l Conf. on Machine Learning. PMLR,
2021. 8748–8763.
[121] Khandelwal A, Weihs L, Mottaghi R, Kembhavi A. Simple but effective: CLIP embeddings for embodied AI. In: Proc. of the 2022
IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 14809–14818. [doi: 10.1109/CVPR52688.
2022.01441]
[122] Trabucco B, Sigurdsson GA, Piramuthu R, Sukhatme GS, Salakhutdinov R. A simple approach for visual room rearrangement: 3D
mapping and semantic search. In: Proc. of the 11th Int’l Conf. on Learning Representations. Kigali: ICLR, 2023.
[123] Sarch G, Fang ZY, Harley AW, Schydlo P, Tarr MJ, Gupta S, Fragkiadaki K. TIDEE: Tidying up novel rooms using visuo-semantic
commonsense priors. In: Proc. of the 17th European Conf. on Computer Vision. Tel Aviv: Springer, 2022. 480–496. [doi: 10.1007/978-3-
031-19842-7_28]
[124] Kant Y, Ramachandran A, Yenamandra S, Gilitschenski I, Batra D, Szot A, Agrawal H. HouseKeep: Tidying virtual households using
commonsense reasoning. In: Proc. of the 17th European Conf. on Computer Vision. Tel Aviv: Springer, 2022. 355–373. [doi: 10.
1007/978-3-031-19842-7_21]
[125] Yamauchi B. A frontier-based approach for autonomous exploration. In: Proc. of the 1997 IEEE Int’l Symp. on Computational
Intelligence in Robotics and Automation: Towards New Computational Principles for Robotics and Automation. Monterey: IEEE, 1997.
146–151. [doi: 10.1109/CIRA.1997.613851]
[126] Ramakrishnan SK, Al-Halah Z, Grauman K. Occupancy anticipation for efficient exploration and navigation. In: Proc. of the 16th
European Conf. on Computer Vision. Glasgow: Springer, 2020. 400–418. [doi: 10.1007/978-3-030-58558-7_24]
[127] Liu S, Okatani T. Symmetry-aware neural architecture for embodied visual exploration. In: Proc. of the 2022 IEEE/CVF Conf. on
Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 17221–17230. [doi: 10.1109/CVPR52688.2022.01673]
[128] Georgakis G, Bucher B, Arapin A, Schmeckpeper K, Matni N, Daniilidis K. Uncertainty-driven planner for exploration and navigation.
In: Proc. of the 2022 Int’l Conf. on Robotics and Automation. Philadelphia: IEEE, 2022. 11295–11302. [doi: 10.1109/ICRA46639.2022.
9812423]
[129] Jiang JD, Zheng LA, Luo F, Zhang ZJ. RedNet: Residual encoder-decoder network for indoor RGB-D semantic segmentation.
arXiv:1806.01054, 2018.
[130] He KM, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE,
2017. 2980–2988. [doi: 10.1109/ICCV.2017.322]
[131] Hu XX, Yang KL, Fei L, Wang KW. ACNet: Attention based network to exploit complementary features for RGBD semantic

344 345 346 347 348 349 350 351 352 353 354