Page 311 - 《软件学报》2025年第9期
P. 311

4222                                                       软件学报  2025  年第  36  卷第  9  期


                     pretraining approach. In: Proc. of the 8th Int’l Conf. on Learning Representations. Addis Ababa: ICLR, 2020. 1–15.
                 [41]   Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting.
                     The Journal of Machine Learning Research, 2014, 15(1): 1929–1958.
                 [42]   Xu  K,  Ba  JL,  Kiros  R,  Cho  K,  Courville  A,  Salakhutdinov  R,  Zemel  RSZ,  Bengio  Y.  Show,  attend  and  tell:  Neural  image  caption
                     generation with visual attention. In: Proc. of the of the 32nd Int’l Conf. on Int’l Conf. on Machine Learning. Lille: JMLR.org, 2015.
                     2048–2057.
                 [43]   Hartigan  JA,  Wong  MA.  Algorithm  AS  136:  A  K-means  clustering  algorithm.  Applied  Statistics,  1979,  28(1):  100.  [doi:  10.2307/
                     2346830]
                 [44]   Kuhn HW. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 1955, 2(1–2): 83–97. [doi: 10.1002/
                     nav.3800020109]
                 [45]   Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S. Generalized intersection over union: A metric and a loss for bounding
                     box regression. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 658–666.
                     [doi: 10.1109/CVPR.2019.00075]
                 [46]   Plummer  BA,  Wang  LW,  Cervantes  CM,  Caicedo  JC,  Hockenmaier  J,  Lazebnik  S.  Flickr30k  Entities:  Collecting  region-to-phrase
                     correspondences for richer image-to-sentence models. In: Proc. of the 2015 IEEE Int’l Conf. on Computer Vision. Santiago: IEEE, 2015.
                     2641–2649. [doi: 10.1109/ICCV.2015.303]
                 [47]   Zhu DY, Chen J, Shen XQ, Li X, Elhoseiny M. MiniGPT-4: Enhancing vision-language understanding with advanced large language
                     models. In: Proc. of the 12th Int’l Conf. on Learning Representations. Vienna: ICLR, 2024.
                 [48]   Liu HT, Li CY, Wu QY, Lee YL. Visual instruction tuning. In: Proc. of the 37th Int’l Conf. on Neural Information Processing Systems.
                     New Orleans: Curran Associates Inc., 2023. 1516.
                 [49]   Li JN, Li DX, Savarese S, Hoi S. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language
                     models. In: Proc. of the 40th Int’l Conf. on Machine Learning. Honolulu: JMLR.org, 2023. 814.
                 [50]   Bang Y, Cahyawijaya S, Lee N, Dai WL, Su D, Wilie B, Lovenia H, Ji ZW, Yu TZ, Chung W, Do QV, Xu Y, Fung P. A multitask,
                     multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. In: Proc. of the 13th Int’l Joint Conf. on
                     Natural Language Processing and the 3rd Conf. of the Asia-Pacific Chapter of the Association for Computational Linguistics (Vol. 1:
                     Long Papers). Nusa Dua: Association for Computational Linguistics, 2023. 675–718. [doi: 10.18653/v1/2023.ijcnlp-main.45]
                 [51]   Dong QX, Li L, Dai DM, Zheng C, Ma JY, Li R, XiaHM, Xu JJ, Wu ZY, Chang BB, Sun X, Li L, Sui ZF. A survey on in-context
                     learning.  In:  Proc.  of  the  2024  Conf.  on  Empirical  Methods  in  Natural  Language  Processing.  Miami:  Association  for  Computational
                     Linguistics, 2022. 1107–1128. [doi: 10.18653/v1/2024.emnlp-main.64]

                 附中文参考文献:
                 [1]   杜鹏飞, 李小勇, 高雅丽. 多模态视觉语言表征学习研究综述. 软件学报, 2021, 32(2): 327–348. http://www.jos.org.cn/1000-9825/6125.
                    htm [doi: 10.13328/j.cnki.jos.006125]

                             赵嘉宁(2000-), 男, 硕士生, CCF  学生会员, 主             罗佳敏(1997-), 女, 博士生, CCF  学生会员, 主
                            要研究领域为自然语言处理.                                要研究领域为自然语言处理.




                             王晶晶(1990-), 男, 博士, 副教授, CCF  专业会             周国栋(1967-), 男, 博士, 教授, 博士生导师,
                            员, 主要研究领域为自然语言处理.                            CCF  杰出会员, 主要研究领域为自然语言处理.
   306   307   308   309   310   311   312   313   314   315   316