Page 333 - 《软件学报》2024年第4期

P. 333

孙福明等: 跨模态交互融合与全局感知的 RGB-D 显著性目标检测 1911

of the 9th Int’l Conf. on Learning Representations (ICLR). OpenReview.net, 2021.
[52] Yan B, Peng HW, Fu JL, Wang D, Lu HC. Learning spatio-temporal Transformer for visual tracking. In: Proc. of the 2021 IEEE/CVF Int’l
Conf. on Computer Vision (ICCV). Montreal: IEEE, 2021. 10428–10437. [doi: 10.1109/ICCV48922.2021.01028]
[53] Stoffl L, Vidal M, Mathis A. End-to-end trainable multi-instance pose estimation with transformers. arXiv:2103.12115, 2021.
[54] Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai XH, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S,
Uszkoreit J, Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. In: Proc. of the 9th Int’l Conf. on
Learning Representations (ICLR). OpenReview.net, 2021.
[55] Liu Z, Lin YT, Cao Y, Hu H, Wei YX, Zhang Z, Lin S, Guo BN. Swin Transformer: Hierarchical vision transformer using shifted
windows. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision (ICCV). Montreal: IEEE, 2021. 9992–10002. [doi: 10.1109/
ICCV48922.2021.00986]
[56] Strudel R, Garcia R, Laptev I, Schmid C. Segmenter: Transformer for semantic segmentation. In: Proc. of the 2021 IEEE/CVF Int’l Conf.
on Computer Vision (ICCV). Montreal: IEEE, 2021. 7242–7252. [doi: 10.1109/ICCV48922.2021.00717]
[57] Xie EZ, Wang WH, Yu ZD, Anandkumar A, Alvarez JM, Luo P. SegFormer: Simple and efficient design for semantic segmentation with
transformers. In: Proc. of the 35th Neural Information Processing Systems (NIPS). 2021. 12077–12090.
[58] Wang WH, Xie EZ, Li X, Fan DP, Song KT, Liang D, Lu T, Luo P, Shao L. Pyramid vision transformer: A versatile backbone for dense
prediction without convolutions. In: Proc. of the 2021 IEEE/CVF Int ’l Conf. on Computer Vision (ICCV). Montreal: IEEE, 2021.
548–558. [doi: 10.1109/ICCV48922.2021.00061]
[59] Zhu HQ, Sun X, Li YX, Ma K, Zhou SK, Zheng YF. DFTR: Depth-supervised fusion Transformer for salient object detection.
arXiv:2203.06429, 2022.
[60] Chen JN, Lu YY, Yu QH, Luo XD, Adeli E, Wang Y, Lu L, Yuille AL, Zhou YY. TransUNet: Transformers make strong encoders for
medical image segmentation. arXiv:2102.04306, 2021.
[61] Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. In: Proc. of the 18th Int’l Conf.
on Medical Image Computing and Computer-assisted Intervention (MICCAI). Munich: Springer, 2015. 234–241. [doi: 10.1007/978-3-
319-24574-4_28]
[62] Wang HY, Zhu YK, Adam H, Yuille A, Chen LC. MaX-DeepLab: End-to-end panoptic segmentation with mask Transformers. In: Proc.
of the 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021. 5459–5470. [doi: 10.1109/
CVPR46437.2021.00542]
[63] Zhang YD, Liu HY, Hu Q. TransFuse: Fusing Transformers and CNNs for medical image segmentation. In: Proc. of the 24th Int’l Conf.
on Medical Image Computing and Computer-assisted Intervention (MICCAI). Strasbourg: Springer, 2021. 14–24. [doi: 10.1007/978-3-
030-87193-2_2]
[64] Luo XD, Hu MH, Song T, Wang GT, Zhang ST. Semi-supervised medical image segmentation via cross teaching between CNN and
Transformer. In: Proc. of the 2022 Int’l Conf. on Medical Imaging with Deep Learning. Zurich: PMLR, 2022. 820–833.
[65] Liu C, Yang G, Wang S, Wang HX, Zhang YH, Wang YT. TANet: Transformer-based asymmetric network for RGB-D salient object
detection. arXiv:2207.01172, 2022.
[66] Chen X, Yan B, Zhu JW, Wang D, Yang XY, Lu HC. Transformer tracking. In: Proc. of the 2021 IEEE/CVF Conf. on Computer Vision
and Pattern Recognition (CVPR). Nashville: IEEE, 2021. 8122–8131. [doi: 10.1109/CVPR46437.2021.00803]
[67] Xie YT, Zhang JP, Shen CH, Xia Y. CoTr: Efficiently bridging CNN and Transformer for 3D medical image segmentation. In: Proc. of
the 24th Int’l Conf. on Medical Image Computing and Computer-assisted Intervention (MICCAI). Strasbourg: Springer, 2021. 171–180.
[doi: 10.1007/978-3-030-87199-4_16]
[68] Hou QB, Zhou DQ, Feng JS. Coordinate attention for efficient mobile network design. In: Proc. of the 2021 IEEE/CVF Conf. on
Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021. 13708–13717. [doi: 10.1109/CVPR46437.2021.01350]
[69] Gao SH, Cheng MM, Zhao K, Zhang XY, Yang MH, Torr P. Res2Net: A new multi-scale backbone architecture. IEEE Trans. on Pattern
Analysis and Machine Intelligence, 2021, 43(2): 652–662. [doi: 10.1109/TPAMI.2019.2938758]
[70] Ba JL, Kiros JR, Hinton GE. Layer normalization. arXiv:1607.06450, 2016.
[71] Ju R, Ge L, Geng WJ, Ren TW, Wu GS. Depth saliency based on anisotropic center-surround difference. In: Proc. of the 2014 IEEE Int’l
Conf. on Image Processing (ICIP). Paris: IEEE, 2014. 1115–1119. [doi: 10.1109/ICIP.2014.7025222]
[72] Fan DP, Lin Z, Zhang Z, Zhu ML, Cheng MM. Rethinking RGB-D salient object detection: Models, data sets, and large-scale
benchmarks. IEEE Trans. on Neural Networks and Learning Systems, 2021, 32(5): 2075–2089. [doi: 10.1109/TNNLS.2020.2996406]
[73] Cheng YP, Fu HZ, Wei XX, Xiao JJ, Cao XC. Depth enhanced saliency detection method. In: Proc. of the 2014 Int’l Conf. on Internet
Multimedia Computing and Service (ICIMCS). Xiamen: ACM, 2014. 23–27. [doi: 10.1145/2632856.2632866]

328 329 330 331 332 333 334 335 336 337 338