Page 466 - 《软件学报》2025年第10期

P. 466

孙锐等: 隐式多尺度对齐与交互的文本-图像行人重识别方法 4863

978-3-031-25072-9_42]
[43] Li JN, Li DX, Xiong CM, Hoi S. BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and
generation. In: Proc. of the 39th Int’l Conf. on Machine Learning. 2022. 12888–12900.
[44] Bao LP, Wei LH, Zhou WG, Liu L, Xie LX, Li HQ, Tian Q. Multi-granularity matching Transformer for text-based person search. IEEE
Trans. on Multimedia, 2024, 26: 4281–4293. [doi: 10.1109/TMM.2023.3321504]
[45] Kim W, Son B, Kim I. ViLT: Vision-and-language Transformer without convolution or region supervision. In: Proc. of the 38th Int’l
Conf. on Machine Learning. 2021. 5583–5594.
[46] Li JN, Selvaraju RR, Gotmare AD, Joty S, Xiong CM, Hoi SCH. Align before fuse: Vision and language representation learning with
momentum distillation. In: Proc. of the 35th Int’l Conf. on Neural Information Processing Systems. Curran Associates Inc., 2021.
9694–9705.

附中文参考文献:
[2] 杨婉香, 严严, 陈思, 张小康, 王菡子. 基于多尺度生成对抗网络的遮挡行人重识别方法. 软件学报, 2020, 31(7): 1943–1958. http://
www.jos.org.cn/1000-9825/5932.htm [doi: 10.13328/j.cnki.jos.005932]

孙锐(1976－), 男, 博士, 教授, CCF 专业会员, 主陈龙(2000－), 男, 硕士生, 主要研究领域为图像
要研究领域为机器学习, 计算机视觉. 信息处理, 计算机视觉.

杜云(1998－), 女, 硕士生, 主要研究领域为图像张旭东(1966－), 男, 博士, 教授, 主要研究领域
信息处理, 计算机视觉. 为智能信息处理, 机器视觉.

461 462 463 464 465 466 467 468 469 470 471