Page 350 - 《软件学报》2025年第4期
P. 350

1756                                                       软件学报  2025  年第  36  卷第  4  期


                      segmentation. In: Proc. of the 2019 IEEE Int’l Conf. on Image Processing (ICIP). Taipei: IEEE, 2019. 1440–1444. [doi: 10.1109/ICIP.
                      2019.8803025]
                 [132]  Zhao HS, Shi JP, Qi XJ, Wang XG, Jia JY. Pyramid scene parsing network. In: Proc. of the 2017 IEEE Conf. on Computer Vision and
                      Pattern Recognition. Honolulu: IEEE, 2017. 6230–6239. [doi: 10.1109/CVPR.2017.660]
                 [133]  Li LH, Zhang PC, Zhang HT, Yang JW, Li CY, Zhong YW, Wang LJ, Yuan L, Zhang L, Hwang JN, Chang KW, Gao JF. Grounded
                      language-image pre-training. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE,
                      2022. 10955–10965. [doi: 10.1109/CVPR52688.2022.01069]
                 [134]  Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. DeepLab: Semantic image segmentation with deep convolutional nets,
                      atrous convolution, and fully connected CRFs. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834–848. [doi:
                      10.1109/TPAMI.2017.2699184]
                 [135]  Ren SQ, He KM, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proc. of the
                      28th Int’l Conf. on Neural Information Processing Systems. Montreal: MIT Press, 2015. 91–99.
                 [136]  Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. End-to-end object detection with transformers. In: Proc. of the
                      16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 213–229. [doi: 10.1007/978-3-030-58452-8_13]
                 [137]  Zhou GZ, Hong YC, Wu Q. NavGPT: Explicit reasoning in vision-and-language navigation with large language models. In: Proc. of the
                      38th AAAI Conf. on Artificial Intelligence. Vancouver: AAAI, 2024. 7641–7649. [doi: 10.1609/aaai.v38i7.28597]

                 [138]  Eftekhar  A,  Zeng  KH,  Duan  JF,  Farhadi  A,  Kembhavi  A,  Krishna  R.  Selective  visual  representations  improve  convergence  and
                      generalization for embodied AI. In: Proc. of the the 12th Int’l Conf. on Learning Representations. Vienna: ICLR, 2024.
                 [139]  Dwork C, McSherry F, Nissim K, Smith A. Calibrating noise to sensitivity in private data analysis. In: Proc. of the 3rd Theory of
                      Cryptography Conf. on Theory of Cryptography. New York: Springer, 2006. 265–284. [doi: 10.1007/11681878_14]
                 [140]  Shah D, Equi MR, Osiński B, Xia F, Ichter B, Levine S. Navigation with large language models: Semantic guesswork as a heuristic for
                      planning. In: Proc. of the 7th Conf. on Robot Learning. Atlanta: PMLR, 2023. 2683–2699.
                 [141]  Song CH, Sadler BM, Wu JM, Chao WL, Washington C, Su Y. LLM-Planner: Few-shot grounded planning for embodied agents with
                      large language models. In: Proc. of the 2023 IEEE/CVF Int’l Conf. on Computer Vision. Paris: IEEE, 2023. 2986–2997. [doi: 10.1109/
                      ICCV51070.2023.00280]
                 [142]  Wu PY, Mu Y, Wu BX, Hou Y, Ma J, Zhang SH, Liu C. VoroNav: Voronoi-based zero-shot object navigation with large language
                      model. arXiv:2401.02695, 2024.
                 [143]  Tsai YHH, Dhar V, Li JL, Zhang BW, Zhang J. Multimodal large language model for visual navigation. arXiv:2310.08669, 2023.
                 [144]  Xi ZH, Chen WX, Guo X, He W, Ding YW, Hong BY, Zhang M, Wang JZ, Jin SJ, Zhou EY, Zheng R, Fan XR, Wang X, Xiong LM,
                      Zhou YH, Wang WR, Jiang CH, Zou YC, Liu XY, Yin ZY, Dou SH, Weng RX, Cheng WS, Zhang Q, Qin WJ, Zheng YY, Qiu XP,
                      Huang XJ, Gui T. The rise and potential of large language model based agents: A survey. arXiv:2309.07864, 2023.
                 [145]  Vuong AD, Nguyen TT, Vu MN, Huang BR, Nguyen D, Binh HTT, Vo T, Nguyen A. HabiCrowd: A high performance simulator for
                      crowd-aware visual navigation. arXiv:2306.11377, 2023.
                 [146]  Cancelli E, Campari T, Serafini L, Chang AX, Ballan L. Exploiting proximity-aware tasks for embodied social navigation. In: Proc. of
                      the 2023 IEEE/CVF Int’l Conf. on Computer Vision. Paris: IEEE, 2023. 10923–10933. [doi: 10.1109/ICCV51070.2023.01006]
                 [147]  Chen BL, Lu SY, Zhong P, Cui YZ, Liang YX, Wang JX. SemNav-HRO: A target-driven semantic navigation strategy with human-
                      robot-object  ternary  fusion.  Engineering  Applications  of  Artificial  Intelligence,  2024,  127:  107370.  [doi:  10.1016/j.engappai.2023.
                      107370]
                 [148]  Luo Q, Sorokin M, Ha S. A few shot adaptation of visual navigation skills to new observations using meta-learning. In: Proc. of the
                      2021  IEEE  Int’l  Conf.  on  Robotics  and  Automation  (ICRA).  Xi’an:  IEEE,  2021.  13231–13237.  [doi:  10.1109/ICRA48506.2021.
                      9561056]
                 [149]  Wang T, Wu ZK, Wang DL. Visual perception generalization for vision-and-language navigation via meta-learning. IEEE Trans. on
                      Neural Networks and Learning Systems, 2023, 34(8): 5193–5199. [doi: 10.1109/TNNLS.2021.3122579]
                 [150]  Dwivedi  K,  Roig  G,  Kembhavi  A,  Mottaghi  R.  What  do  navigation  agents  learn  about  their  environment?  In:  Proc.  of  the  2022
                      IEEE/CVF  Conf.  on  Computer  Vision  and  Pattern  Recognition.  New  Orleans:  IEEE,  2022.  10266–10275.  [doi: 10.1109/CVPR
                      52688.2022.01003]
                 [151]  Yang ZJ, Majumdar A, Lee S. Behavioral analysis of vision-and-language navigation agents. In: Proc. of the 2023 IEEE/CVF Conf. on
                      Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 2574–2582. [doi: 10.1109/CVPR52729.2023.00253]
                 [152]  Driess D, Xia F, Sajjadi MSM, Lynch C, Chowdhery A, Ichter B, Wahid A, Tompson J, Vuong Q, Yu TH, Huang WL, Chebotar Y,
                      Sermanet P, Duckworth D, Levine S, Vanhoucke V, Hausman K, Toussaint M, Greff K, Zeng A, Mordatch I, Florence P. PaLM-E: An
                      embodied multimodal language model. In: Proc. of the 40th Int’l Conf. on Machine Learning. Honolulu: PMLR, 2023. 8469–8488.
   345   346   347   348   349   350   351   352   353   354   355