Page 465 - 《软件学报》2025年第10期
P. 465

4862                                                      软件学报  2025  年第  36  卷第  10  期


                     person retrieval. In: Proc. of the 29th ACM Int’l Conf. on Multimedia. ACM, 2021. 209–217. [doi: 10.1145/3474085.3475369]
                 [19]   Sarafianos N, Xu X, Kakadiaris I. Adversarial representation learning for text-to-image matching. In: Proc. of the 2019 IEEE/CVF Int’l
                     Conf. on Computer Vision (ICCV). Seoul: IEEE, 2019. 5813–5823. [doi: 10.1109/ICCV.2019.00591]
                 [20]   Gao CY, Cai GY, Jiang XY, Zheng F, Zhang J, Gong YF, Peng P, Guo XW, Sun X. Contextual non-local alignment over full-scale
                     representation for text-based person search. arXiv:2101.03036, 2021.
                 [21]   Niu K, Huang Y, Ouyang WL, Wang L. Improving description-based person re-identification by multi-granularity image-text alignments.
                     IEEE Trans. on Image Processing, 2020, 29: 5542–5556. [doi: 10.1109/TIP.2020.2984883]
                 [22]   Chen DP, Li HS, Liu XH, Shen YT, Shao J, Yuan ZJ, Wang XG. Improving deep visual representation for person re-identification by
                     global and local image-language association. In: Proc. of the 15th European Conf. on Computer Vision (ECCV). Munich: Springer, 2018.
                     56–73. [doi: 10.1007/978-3-030-01270-0_4]
                 [23]   Liu JW, Zha ZJ, Hong RC, Wang M, Zhang YD. Deep adversarial graph attention convolution network for text-based person search. In:
                     Proc. of the 27th ACM Int’l Conf. on Multimedia. Nice: ACM, 2019. 665–673. [doi: 10.1145/3343031.3350991]
                 [24]   Ding  ZF,  Ding  CX,  Shao  ZY,  Tao  DC.  Semantically  self-aligned  network  for  text-to-image  part-aware  person  re-identification.
                     arXiv:2107.12666, 2021.
                 [25]   Chen YC, Li LJ, Yu LC, El Kholy A, Ahmed F, Gan Z, Cheng Y, Liu JJ. UNITER: Universal image-text representation learning. In:
                     Proc. of the 16th European Conf. on Computer Vision (ECCV). Glasgow: Springer, 2020. 104–120. [doi: 10.1007/978-3-030-58577-8_7]
                 [26]   Jia  C,  Yang  YF,  Xia  Y,  Chen  YT,  Parekh  Z,  Pham  H,  Le  Q,  Sung  YH,  Li  Z,  Duerig  T.  Scaling  up  visual  and  vision-language
                     representation learning with noisy text supervision. In: Proc. of the 38th Int’l Conf. on Machine Learning. 2021. 4904–4916.
                 [27]   Antol S, Agrawal A, Lu JS, Mitchell M, Batra D, Zitnick CL, Parikh D. VQA: Visual question answering. In: Proc. of the 2015 IEEE Int’l
                     Conf. on Computer Vision (ICCV). Santiago: IEEE, 2015. 2425–2433. [doi: 10.1109/ICCV.2015.279]
                 [28]   Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I. Learning
                     transferable visual models from natural language supervision. In: Proc. of the 38th Int’l Conf. on Machine Learning. 2021. 8748–8763.
                 [29]   Yan SL, Dong N, Zhang LY, Tang JH. CLIP-driven fine-grained text-image person re-identification. IEEE Trans. on Image Processing,
                     2023, 32: 6032–6046. [doi: 10.1109/TIP.2023.3327924]
                 [30]   Chen WJ, Yao LL, Jin Q. Rethinking benchmarks for cross-modal image-text retrieval. In: Proc. of the 46th Int’l ACM SIGIR Conf. on
                     Research and Development in Information Retrieval. Taipei: ACM, 2023. 1241–1251. [doi: 10.1145/3539618.3591758]
                 [31]   Zhou KY, Yang JK, Loy CC, Liu ZW. Learning to prompt for vision-language models. Int’l Journal of Computer Vision, 2022, 130(9):
                     2337–2348. [doi: 10.1007/s11263-022-01653-1]
                 [32]   Sennrich R, Haddow B, Birch A. Neural machine translation of rare words with subword units. arXiv:1508.07909, 2016.
                 [33]   Wu YS, Yan ZZ, Han XG, Li GB, Zou CQ, Cui SG. LapsCore: Language-guided person search via color reasoning. In: Proc. of the 2021
                     IEEE/CVF Int’l Conf. on Computer Vision (ICCV). Montreal: IEEE, 2021. 1604–1613. [doi: 10.1109/ICCV48922.2021.00165]
                 [34]   Yan SL, Tang H, Zhang LY, Tang JH. Image-specific information suppression and implicit local alignment for text-based person search.
                     IEEE Trans. on Neural Networks and Learning Systems, 2024, 35(12): 17973–17986. [doi: 10.1109/TNNLS.2023.3310118]
                 [35]   Wang ZJ, Zhu AC, Xue JY, Wan XL, Liu C, Wang T, Li YF. Look before you leap: Improving text-based person retrieval by learning a
                     consistent cross-modal common manifold. In: Proc. of the 30th ACM Int’l Conf. on Multimedia. Lisboa: ACM, 2022. 1984–1992. [doi:
                     10.1145/3503161.3548166]
                 [36]   Han X, He S, Zhang L, Xiang T. Text-based person search with limited data. arXiv:2110.10807, 2021.
                 [37]   Li SP, Cao M, Zhang M. Learning semantic-aligned feature representation for text-based person search. In: Proc. of the 2022 IEEE Int’l
                     Conf. on Acoustics, Speech and Signal Processing (ICASSP). Singapore: IEEE, 2022. 2724–2728. [doi: 10.1109/ICASSP43922.2022.
                     9746846]
                 [38]   Chen YH, Zhang GQ, Lu YJ, Wang ZX, Zheng YH. TIPCB: A simple but effective part-based convolutional baseline for text-based
                     person search. Neurocomputing, 2022, 494: 171–181. [doi: 10.1016/j.neucom.2022.04.081]
                 [39]   Wang ZJ, Zhu AC, Xue JY, Wan XL, Liu C, Wang T, Li YF. CAIBC: Capturing all-round information beyond color for text-based
                     person retrieval. In: Proc. of the 30th ACM Int’l Conf. on Multimedia. Lisboa: ACM, 2022. 5314–5322. [doi: 10.1145/3503161.3548057]
                 [40]   Farooq A, Awais M, Kittler J, Khalid SS. AXM-Net: Implicit cross-modal feature alignment for person re-identification. In: Proc. of the
                     36th AAAI Conf. on Artificial Intelligence. Virtually: AAAI, 2022. 4477–4485. [doi: 10.1609/aaai.v36i4.20370]
                 [41]   Shao  ZY,  Zhang  XY,  Fang  M,  Lin  ZF,  Wang  J,  Ding  CX.  Learning  granularity-unified  representations  for  text-to-image  person  re-
                     identification. In: Proc. of the 30th ACM Int’l Conf. on Multimedia. Lisboa: ACM, 2022. 5566–5574. [doi: 10.1145/3503161.3548028]
                 [42]   Shu XJ, Wen W, Wu HQ, Chen KY, Song YR, Qiao RZ, Ren B, Wang X. See finer, see more: Implicit modality alignment for text-based
                     person retrieval. In: Proc. of the 2022 European Conf. on Computer Vision (ECCV). Tel Aviv: Springer, 2022. 624–641. [doi: 10.1007/
   460   461   462   463   464   465   466   467   468   469   470