Page 458 - 《软件学报》2025年第10期
P. 458

孙锐 等: 隐式多尺度对齐与交互的文本-图像行人重识别方法                                                   4855



                                                    L CMPM = L v2t  + L t2v                          (11)
                                                           CMPM  CMPM
                    同时, 考虑到多头注意模块中不同           head  的关注块可以捕获彼此冗余和重叠的语义, 为了充分挖掘图像和文本
                 中的细粒度细节, 希望不同尺度的特征聚焦于不一致的信息, 我们对不同尺度的特征施加多样性约束损失                                  L div , 避
                 免信息冗余, 如公式      (12) 所示.

                                                                       t
                                                     N    f f  v     f f  t  
                                                            v
                                                  N ∑ ∑ 
                                                           i  j       i  j   
                                             L div =      
  
 
  
 + 
 
 
 
                   (12)
                                                        
                                                                     t
                                                                         t
                                                        
         
 
 
 
 
                                                          v
 
  v
                                                                     i
                                                          i
                                                               j
                                                 i=1 j=1, i,j  
 f 
 
 f 
 2  
 f 
 
 f
                                                                      2
                                                            2
                                                                         j 2
                                        L id  将行人图像或文本按身份划分为不同的群体, 保证了身份层次的匹配. 它明确地
                    此外, 我们采用身份损失
                 考虑了模态间的距离, 保证了同一图像/文本组的特征表示在联合嵌入空间中紧密地聚类在一起. 其中,                               W id  是用于
                 调整不同标签重要性的权重向量,           GN(X) 是通过全局规范化处理得到的归一化图像特征向量, 身份损失表示为:

                                               L id (X) = −log(Softmax(W id ×GN(X)))                 (13)
                    通过上述跨模态投影匹配损失、多样性损失和身份损失的约束, 我们可从图文中获得不同的语义对齐感知特
                 征. 综上, 最终的损失函数表示如下:

                                                                                                     (14)
                                                     L = L CMPM + L div + L id
                  3   实验结果与分析
                  3.1   数据集与性能评价指标
                    为了验证本文方法的有效性, 我们在            3  个具有挑战性的文本到图像的人物检索数据集               CUHK-PEDES、ICFG-
                 PEDES  及  RSTPReid  上进行了广泛的性能评估.
                                [6]
                    CUHK-PEDES 是第     1  个专门用于文本到图像的人检索的数据集, 如图                6  所示, 包含了  40 206  幅图像和
                 80 412  个文本描述, 用于  13 003  个身份. 按照官方数据分割方法, 训练集由         11 003  个身份、34 054  个图像和  68 108
                 个文本描述组成. 验证集包含         3 078  张图像和  6 156  个文本描述, 而测试集包含     3 074  张图像和  6 148  个文本描述,
                 它们都有   1 000  个标识.

                               A woman in a pink shirt, a pair of blue jean shorts  A woman in blue jean shorts, light colored shoes
                               and a pair of gray shoes.                and a pink top carries a light colored shoulder bag
                   Ⅰ           The woman is seen from behind wearing a light  outside.
                               colored t-shirt with a pair of dark capris, and a tan  A woman in a pink shirt, a pair of blue jean
                               purse slung across her body from her left shoulder  shorts and a pair of gray shoes.
                               to her right hip.
                               A lady with long black hair.Wearing a black shirt  Female with dark hair parted down the middles,
                               and black short pants.With tan or light colored  wearing upper garment that is partially white but
                   Ⅱ           high heels ,she is also carrying a red purse and  mainly black. Black pants that end just below
                               walking next to bickes.                  knees and light colored shoes.
                               A woman in a black shirt, a pair of black pants  A woman in a white shirt, a pair of black pants
                               and a pair of pink shoes.                and a pair of white socks.
                               A man in a white shirt with a picture on the front, a  The man is carrying a piece of paper in his left
                               pair of gray shorts and a pair of gray shoes.  hand. He has black hair.
                               The pedestrian with short, dark hair walks with  This person is visible from the back, they are
                   Ⅲ                                                    wearing a white short sleeve tee shirt, gray
                               their left hand over their stomach. He wears a
                               white, graphic t-shirt with gray shorts and shoes.  Bermuda shorts and is carrying something in his
                                                                        left hand.
                               A woman with black hair is wearing a yellow and  This woman has long dark hair. She is wearing a
                               black top, light pants, light pink purse and white  jacket, jeans and sneakers. She is carrying a large
                  Ⅳ            sneakers.                                purse.
                               A woman wearing a black shirt, a pair of blue jeans  A woman wearing a white and black shirt, a pair
                               and a pair of black and white shoes.     of blue jean pants and a pair of white and black
                                                                        shoes.
                                        图 6 来自   CUHK-PEDES  数据集的行人图像-文本对
   453   454   455   456   457   458   459   460   461   462   463