Page 321 - 《软件学报》2024年第4期
P. 321

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
                 Journal of Software,2024,35(4):1899−1913 [doi: 10.13328/j.cnki.jos.006833]  http://www.jos.org.cn
                 ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563



                                                                                   *
                 跨模态交互融合与全局感知的                        RGB-D     显著性目标检测

                 孙福明,    胡锡航,    武景宇,    孙    静,    王法胜


                 (大连民族大学 信息与通信工程学院, 辽宁 大连 116600)
                 通信作者: 王法胜, E-mail: wangfasheng@dlnu.edu.cn

                 摘 要: 近年来, RGB-D    显著性检测方法凭借深度图中丰富的几何结构和空间位置信息, 取得了比                        RGB  显著性检
                 测模型更好的性能, 受到学术界的高度关注. 然而, 现有的                RGB-D  检测模型仍面临着持续提升检测性能的需求. 最
                 近兴起的    Transformer 擅长建模全局信息, 而卷积神经网络         (CNN) 擅长提取局部细节. 因此, 如何有效结合            CNN
                 和  Transformer 两者的优势, 挖掘全局和局部信息, 将有助于提升显著性目标检测的精度. 为此, 提出一种基于跨模
                 态交互融合与全局感知的         RGB-D  显著性目标检测方法, 通过将         Transformer 网络嵌入  U-Net 中, 从而将全局注意
                 力机制与局部卷积结合在一起, 能够更好地对特征进行提取. 首先借助                     U-Net 编码-解码结构, 高效地提取多层次互
                 补特征并逐级解码生成显著特征图. 然后, 使用              Transformer 模块学习高级特征间的全局依赖关系增强特征表示,
                 并针对输入采用渐进上采样融合策略以减少噪声信息的引入. 其次, 为了减轻低质量深度图带来的负面影响, 设计
                 一个跨模态交互融合模块以实现跨模态特征融合. 最后, 5                 个基准数据集上的实验结果表明, 所提算法与其他最新
                 的算法相比具有显著优势.
                 关键词: 显著性目标检测; 跨模态; 全局注意力机制; RGB-D             检测模型
                 中图法分类号: TP391

                 中文引用格式: 孙福明, 胡锡航, 武景宇, 孙静, 王法胜. 跨模态交互融合与全局感知的RGB-D显著性目标检测. 软件学报, 2024,
                 35(4): 1899–1913. http://www.jos.org.cn/1000-9825/6833.htm
                 英文引用格式: Sun FM, Hu XH, Wu JY, Sun J, Wang FS. RGB-D Salient Object Detection Based on Cross-modal Interactive Fusion
                 and Global Awareness. Ruan Jian Xue Bao/Journal of Software, 2024, 35(4): 1899–1913 (in Chinese). http://www.jos.org.cn/1000-9825
                 /6833.htm

                 RGB-D Salient Object Detection Based on Cross-modal Interactive Fusion and Global Awareness
                 SUN Fu-Ming, HU Xi-Hang, WU Jing-Yu, SUN Jing, WANG Fa-Sheng
                 (School of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China)
                 Abstract:  In recent years, RGB-D salient detection method has achieved better performance than RGB salient detection model by virtue of
                 its  rich  geometric  structure  and  spatial  position  information  in  depth  maps  and  thus  has  been  highly  concerned  by  the  academic
                 community.  However,  the  existing  RGB-D  detection  model  still  faces  the  challenge  of  improving  performance  continuously.  The  emerging
                 Transformer  is  good  at  modeling  global  information,  while  the  convolutional  neural  network  (CNN)  is  good  at  extracting  local  details.
                 Therefore,  effectively  combining  the  advantages  of  CNN  and  Transformer  to  mine  global  and  local  information  will  help  to  improve  the
                 accuracy  of  salient  object  detection.  For  this  purpose,  an  RGB-D  salient  object  detection  method  based  on  cross-modal  interactive  fusion
                 and  global  awareness  is  proposed  in  this  study.  The  transformer  network  is  embedded  into  U-Net  to  better  extract  features  by  combining
                 the  global  attention  mechanism  with  local  convolution.  First,  with  the  help  of  the  U-Net  encoder-decoder  structure,  this  study  efficiently
                 extracts  multi-level  complementary  features  and  decodes  them  step  by  step  to  generate  a  salient  feature  map.  Then,  the  Transformer
                 module  is  used  to  learn  the  global  dependency  between  high-level  features  to  enhance  the  feature  representation,  and  the  progressive
                 upsampling  fusion  strategy  is  used  to  process  the  input  and  reduce  the  introduction  of  noise  information.  Moreover,  to  reduce  the  negative


                 *    基金项目: 国家自然科学基金  (61976042, 61972068); 兴辽英才计划  (XLYC2007023); 辽宁省高等学校创新人才支持计划  (LR2019020)
                  收稿时间: 2022-06-29; 修改时间: 2022-09-01, 2022-10-10; 采用时间: 2022-11-01; jos 在线出版时间: 2023-06-14
                  CNKI 网络首发时间: 2023-06-15
   316   317   318   319   320   321   322   323   324   325   326