Page 347 - 《软件学报》2025年第4期
P. 347

陈铂垒 等: 面向具身人工智能的物体目标导航综述                                                        1753


                 [65]  Du HM, Yu X, Zheng L. VTNet: Visual Transformer network for object goal navigation. In: Proc. of the 9th Int’l Conf. on Learning
                      Representations. ICLR, 2021.
                 [66]  Fukushima R, Ota K, Kanezaki A, Sasaki Y, Yoshiyasu Y. Object memory Transformer for object goal navigation. In: Proc. of the 2022
                      Int’l Conf. on Robotics and Automation. Philadelphia: IEEE, 2022. 11288–11294. [doi: 10.1109/ICRA46639.2022.9812027]
                 [67]  Georgakis G, Schmeckpeper K, Wanchoo K, Dan S, Miltsakaki E, Roth D, Daniilidis K. Cross-modal map learning for vision and
                      language navigation. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022.
                      15439–15449. [doi: 10.1109/CVPR52688.2022.01502]
                 [68]  Henriques JF, Vedaldi A. MapNet: An allocentric spatial memory for mapping environments. In: Proc. of the 2018 IEEE/CVF Conf. on
                      Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 8476–8484. [doi: 10.1109/CVPR.2018.00884]
                 [69]  Cartillier V, Ren ZL, Jain N, Lee S, Essa I, Batra D. Semantic MapNet: Building allocentric semantic maps and representations from
                      egocentric views. In: Proc. of the 35th AAAI Conf. on Artificial Intelligence. AAAI, 2021. 964–972. [doi: 10.1609/aaai.v35i2.16180]
                 [70]  Chen PH, Ji DY, Lin KY, Zeng RH, Li TH, Tan MK, Gan C. Weakly-supervised multi-granularity map learning for vision-and-language
                      navigation. In: Proc. of the 36th Int’l Conf. on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2022.
                      2764.
                 [71]  Xu DF, Zhu YK, Choy CB, Fei-Fei L. Scene graph generation by iterative message passing. In: Proc. of the 2017 IEEE/CVF Conf. on
                      Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 3097–3106. [doi: 10.1109/CVPR.2017.330]
                 [72]  Yang  JW,  Lu  JS,  Lee  S,  Batra  D,  Parikh  D.  Graph  R-CNN  for  scene  graph  generation.  In:  Proc.  of  the  15th  European  Conf.  on
                      Computer Vision. Munich: Springer, 2018. 690–706. [doi: 10.1007/978-3-030-01246-5_41]
                 [73]  Zellers R, Yatskar M, Thomson S, Choi Y. Neural motifs: Scene graph parsing with global context. In: Proc. of the 2018 IEEE/CVF
                      Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 5831–5840. [doi: 10.1109/CVPR.2018.00611]
                 [74]  Ost J, Mannan F, Thuerey N, Knodt J, Heide F. Neural scene graphs for dynamic scenes. In: Proc. of the 2021 IEEE/CVF Conf. on
                      Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 2855–2864. [doi: 10.1109/CVPR46437.2021.00288]
                 [75]  Tsai  YHH,  Divvala  S,  Morency  LP,  Salakhutdinov  R,  Farhadi  A.  Video  relationship  reasoning  using  gated  spatio-temporal  energy
                      graph. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 10416–10425.
                      [doi: 10.1109/CVPR.2019.01067]
                 [76]  Giuliari F, Skenderi G, Cristani M, Wang YM, Del Bue A. Spatial commonsense graph for object localisation in partial scenes. In: Proc.
                      of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 19496–19505. [doi: 10.1109/
                      CVPR52688.2022.01891]
                 [77] IEEE, 2023. 14931–14942. [doi: 10.1109/CVPR52729.2023.01434]
                      Gao  C,  Chen  JY,  Liu  S,  Wang  LT,  Zhang  Q,  Wu  Q.  Room-and-object  aware  knowledge  reasoning  for  remote  embodied  referring
                      expression. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 3063–3072.
                      [doi: 10.1109/CVPR46437.2021.00308]
                 [78]  Gadre SY, Ehsani K, Song SR, Mottaghi R. Continuous scene representations for embodied AI. In: Proc. of the 2022 IEEE/CVF Conf.
                      on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 14829–14839. [doi: 10.1109/CVPR52688.2022.01443]
                 [79]  Du  YL,  Gan  C,  Isola  P.  Curious  representation  learning  for  embodied  intelligence.  In:  Proc.  of  the  2021  IEEE/CVF  Int’l  Conf.  on
                      Computer Vision. Montreal: IEEE, 2021. 10388–10397. [doi: 10.1109/ICCV48922.2021.01024]
                 [80]  Girshick R. Fast R-CNN. In: Proc. of the 2015 IEEE Int’l Conf. on Computer Vision. Santiago: IEEE, 2015. 1440–1448. [doi: 10.1109/
                      ICCV.2015.169]
                 [81]  van den Oord A, Li YZ, Vinyals O. Representation learning with contrastive predictive coding. arXiv:1807.03748, 2018.
                 [82]  Zhu H, Kapoor R, Min SY, Han W, Li JT, Geng KW, Neubig G, Bisk Y, Kembhavi A, Weihs L. EXCALIBUR: Encouraging and
                      evaluating embodied exploration. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Vancouver:

                 [83]  Chaplot DS, Gandhi D, Gupta S, Gupta A, Salakhutdinov R. Learning to explore using active neural SLAM. In: Proc. of the 8th Int’l
                      Conf. on Learning Representations. Addis Ababa: ICLR, 2019.
                 [84]  Bigazzi R, Cornia M, Cascianelli S, Baraldi L, Cucchiara R. Embodied agents for efficient exploration and smart scene description. In:
                      Proc. of the 2023 IEEE Int’l Conf. on Robotics and Automation (ICRA). London: IEEE, 2023. 6057–6064. [doi: 10.1109/ICRA48891.
                      2023.10160668]
                 [85]  Savinov N, Raichuk A, Vincent D, Marinier R, Pollefeys M, Lillicrap T, Gelly S. Episodic curiosity through reachability. In: Proc. of the
                      7th Int’l Conf. on Learning Representations. New Orleans: ICLR, 2019.
                 [86]  Strehl  AL,  Littman  ML.  An  analysis  of  model-based  interval  estimation  for  Markov  decision  processes.  Journal  of  Computer  and
                      System Sciences, 2008, 74(8): 1309–1331. [doi: 10.1016/j.jcss.2007.08.009]
                 [87]  Bellemare MG, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R. Unifying count-based exploration and intrinsic motivation. In:
   342   343   344   345   346   347   348   349   350   351   352