Page 333 - 《软件学报》2024年第4期
P. 333

孙福明 等: 跨模态交互融合与全局感知的            RGB-D  显著性目标检测                                  1911


                     of the 9th Int’l Conf. on Learning Representations (ICLR). OpenReview.net, 2021.
                 [52]  Yan B, Peng HW, Fu JL, Wang D, Lu HC. Learning spatio-temporal Transformer for visual tracking. In: Proc. of the 2021 IEEE/CVF Int’l
                     Conf. on Computer Vision (ICCV). Montreal: IEEE, 2021. 10428–10437. [doi: 10.1109/ICCV48922.2021.01028]
                 [53]  Stoffl L, Vidal M, Mathis A. End-to-end trainable multi-instance pose estimation with transformers. arXiv:2103.12115, 2021.
                 [54]  Dosovitskiy  A,  Beyer  L,  Kolesnikov  A,  Weissenborn  D,  Zhai  XH,  Unterthiner  T,  Dehghani  M,  Minderer  M,  Heigold  G,  Gelly  S,
                     Uszkoreit J, Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. In: Proc. of the 9th Int’l Conf. on
                     Learning Representations (ICLR). OpenReview.net, 2021.
                 [55]  Liu  Z,  Lin  YT,  Cao  Y,  Hu  H,  Wei  YX,  Zhang  Z,  Lin  S,  Guo  BN.  Swin  Transformer:  Hierarchical  vision  transformer  using  shifted
                     windows. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision (ICCV). Montreal: IEEE, 2021. 9992–10002. [doi: 10.1109/
                     ICCV48922.2021.00986]
                 [56]  Strudel R, Garcia R, Laptev I, Schmid C. Segmenter: Transformer for semantic segmentation. In: Proc. of the 2021 IEEE/CVF Int’l Conf.
                     on Computer Vision (ICCV). Montreal: IEEE, 2021. 7242–7252. [doi: 10.1109/ICCV48922.2021.00717]
                 [57]  Xie EZ, Wang WH, Yu ZD, Anandkumar A, Alvarez JM, Luo P. SegFormer: Simple and efficient design for semantic segmentation with
                     transformers. In: Proc. of the 35th Neural Information Processing Systems (NIPS). 2021. 12077–12090.
                 [58]  Wang WH, Xie EZ, Li X, Fan DP, Song KT, Liang D, Lu T, Luo P, Shao L. Pyramid vision transformer: A versatile backbone for dense
                     prediction  without  convolutions.  In:  Proc.  of  the  2021  IEEE/CVF  Int ’l  Conf.  on  Computer  Vision  (ICCV).  Montreal:  IEEE,  2021.
                     548–558. [doi: 10.1109/ICCV48922.2021.00061]
                 [59]  Zhu  HQ,  Sun  X,  Li  YX,  Ma  K,  Zhou  SK,  Zheng  YF.  DFTR:  Depth-supervised  fusion  Transformer  for  salient  object  detection.
                     arXiv:2203.06429, 2022.
                 [60]  Chen JN, Lu YY, Yu QH, Luo XD, Adeli E, Wang Y, Lu L, Yuille AL, Zhou YY. TransUNet: Transformers make strong encoders for
                     medical image segmentation. arXiv:2102.04306, 2021.
                 [61]  Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. In: Proc. of the 18th Int’l Conf.
                     on Medical Image Computing and Computer-assisted Intervention (MICCAI). Munich: Springer, 2015. 234–241. [doi: 10.1007/978-3-
                     319-24574-4_28]
                 [62]  Wang HY, Zhu YK, Adam H, Yuille A, Chen LC. MaX-DeepLab: End-to-end panoptic segmentation with mask Transformers. In: Proc.
                     of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021. 5459–5470. [doi: 10.1109/
                     CVPR46437.2021.00542]
                 [63]  Zhang YD, Liu HY, Hu Q. TransFuse: Fusing Transformers and CNNs for medical image segmentation. In: Proc. of the 24th Int’l Conf.
                     on Medical Image Computing and Computer-assisted Intervention (MICCAI). Strasbourg: Springer, 2021. 14–24. [doi: 10.1007/978-3-
                     030-87193-2_2]
                 [64]  Luo XD, Hu MH, Song T, Wang GT, Zhang ST. Semi-supervised medical image segmentation via cross teaching between CNN and
                     Transformer. In: Proc. of the 2022 Int’l Conf. on Medical Imaging with Deep Learning. Zurich: PMLR, 2022. 820–833.
                 [65]  Liu C, Yang G, Wang S, Wang HX, Zhang YH, Wang YT. TANet: Transformer-based asymmetric network for RGB-D salient object
                     detection. arXiv:2207.01172, 2022.
                 [66]  Chen X, Yan B, Zhu JW, Wang D, Yang XY, Lu HC. Transformer tracking. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision
                     and Pattern Recognition (CVPR). Nashville: IEEE, 2021. 8122–8131. [doi: 10.1109/CVPR46437.2021.00803]
                 [67]  Xie YT, Zhang JP, Shen CH, Xia Y. CoTr: Efficiently bridging CNN and Transformer for 3D medical image segmentation. In: Proc. of
                     the 24th Int’l Conf. on Medical Image Computing and Computer-assisted Intervention (MICCAI). Strasbourg: Springer, 2021. 171–180.
                     [doi: 10.1007/978-3-030-87199-4_16]
                 [68]  Hou  QB,  Zhou  DQ,  Feng  JS.  Coordinate  attention  for  efficient  mobile  network  design.  In:  Proc.  of  the  2021  IEEE/CVF  Conf.  on
                     Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021. 13708–13717. [doi: 10.1109/CVPR46437.2021.01350]
                 [69]  Gao SH, Cheng MM, Zhao K, Zhang XY, Yang MH, Torr P. Res2Net: A new multi-scale backbone architecture. IEEE Trans. on Pattern
                     Analysis and Machine Intelligence, 2021, 43(2): 652–662. [doi: 10.1109/TPAMI.2019.2938758]
                 [70]  Ba JL, Kiros JR, Hinton GE. Layer normalization. arXiv:1607.06450, 2016.
                 [71]  Ju R, Ge L, Geng WJ, Ren TW, Wu GS. Depth saliency based on anisotropic center-surround difference. In: Proc. of the 2014 IEEE Int’l
                     Conf. on Image Processing (ICIP). Paris: IEEE, 2014. 1115–1119. [doi: 10.1109/ICIP.2014.7025222]
                 [72]  Fan  DP,  Lin  Z,  Zhang  Z,  Zhu  ML,  Cheng  MM.  Rethinking  RGB-D  salient  object  detection:  Models,  data  sets,  and  large-scale
                     benchmarks. IEEE Trans. on Neural Networks and Learning Systems, 2021, 32(5): 2075–2089. [doi: 10.1109/TNNLS.2020.2996406]
                 [73]  Cheng YP, Fu HZ, Wei XX, Xiao JJ, Cao XC. Depth enhanced saliency detection method. In: Proc. of the 2014 Int’l Conf. on Internet
                     Multimedia Computing and Service (ICIMCS). Xiamen: ACM, 2014. 23–27. [doi: 10.1145/2632856.2632866]
   328   329   330   331   332   333   334   335   336   337   338