Page 474 - 《软件学报》2025年第9期
P. 474
李军侠 等: 基于语义调制的弱监督语义分割 4385
[8] Chang YT, Wang QS, Hung WC, Piramuthu R, Tsai YH, Yang MH. Weakly-supervised semantic segmentation via sub-category
exploration. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 8988–8997. [doi:
10.1109/CVPR42600.2020.00901]
[9] Zhou BL, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In: Proc. of the 2016 IEEE
Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 2921–2929. [doi: 10.1109/CVPR.2016.319]
[10] Chen ZW, Wang CA, Wang YB, Jiang GN, Shen YH, Tai Y, Wang CJ, Zhang W, Cao LJ. LCTR: On awakening the local continuity of
Transformer for weakly supervised object localization. In: Proc. of the 36th AAAI Conf. on Artificial Intelligence. AAAI Press, 2020.
710–718. [doi: 10.1609/aaai.v36i1.19918]
[11] Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai XH, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S,
Uszkoreit J, Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. In: Proc. of the 9th Int’l Conf. on
Learning Representations. OpenReview.net, 2021.
[12] Shi ZN, Chen HP, Zhang D, Shen XJ. Pre-training-driven multimodal boundary-aware vision Transformer. Ruan Jian Xue Bao/Journal of
Software, 2023, 34(5): 2051–2067 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6768.htm [doi: 10.13328/j.cnki.
jos.006768]
[13] Gao W, Wan F, Pan XJ, Peng ZL, Tian Q, Han ZJ, Zhou BL, Ye QX. TS-CAM: Token semantic coupled attention map for weakly
supervised object localization. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 2866–2875. [doi:
10.1109/ICCV48922.2021.00288]
[14] Xu L, Ouyang WL, Bennamoun M, Boussaid F, Xu D. Multi-class token Transformer for weakly supervised semantic segmentation. In:
Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4300–4309. [doi: 10.1109/
CVPR52688.2022.00427]
[15] Wei YC, Feng JS, Liang XD, Cheng MM, Zhao Y, Yan SC. Object region mining with adversarial erasing: A simple classification to
semantic segmentation approach. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017.
6488–6496. [doi: 10.1109/CVPR.2017.687]
[16] Jiang PT, Hou QB, Cao Y, Cheng MM, Wei YC, Xiong HK. Integral object mining via online attention accumulation. In: Proc. of the
2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 2070–2079. [doi: 10.1109/ICCV.2019.00216]
[17] Sun GL, Wang WG, Dai JF, van Gool L. Mining cross-image semantics for weakly supervised semantic segmentation. In: Proc. of the
16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 347–365. [doi: 10.1007/978-3-030-58536-5_21]
[18] Zhang F, Gu CC, Zhang CY, Dai YC. Complementary patch for weakly supervised semantic segmentation. In: Proc. of the 2021
IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 7222–7231. [doi: 10.1109/ICCV48922.2021.00715]
[19] Jiang PT, Yang YQ, Hou QB, Wei YC. L2G: A simple local-to-global knowledge transfer framework for weakly supervised semantic
segmentation. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022.
16865–16875. [doi: 10.1109/CVPR52688.2022.01638]
[20] Qin J, Wu J, Xiao XF, Li LJ, Wang XG. Activation modulation and recalibration scheme for weakly supervised semantic segmentation.
In: Proc. of the 36th AAAI Conf. on Artificial Intelligence. AAAI Press, 2022. 2117–2125. [doi: 10.1609/aaai.v36i2.20108]
[21] Ahn J, Cho S, Kwak S. Weakly supervised learning of instance segmentation with inter-pixel relations. In: Proc. of the 2019 IEEE/CVF
Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 2209–2218. [doi: 10.1109/CVPR.2019.00231]
[22] Ru LX, Zhan YB, Yu BS, Du B. Learning affinity from attention: End-to-end weakly-supervised semantic segmentation with
Transformers. In: Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022.
16825–16834. [doi: 10.1109/CVPR52688.2022.01634]
[23] Ru LX, Zheng HL, Zhan YB, Du B. Token contrast for weakly-supervised semantic segmentation. In: Proc. of the 2023 IEEE/CVF Conf.
on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 3093–3102. [doi: 10.1109/CVPR52729.2023.00302]
[24] Li RW, Mai ZD, Zhang ZB, Jang J, Sanner S. TransCAM: Transformer attention-based CAM refinement for weakly supervised semantic
segmentation. Journal of Visual Communication and Image Representation, 2023, 92: 103800. [doi: 10.1016/j.jvcir.2023.103800]
[25] Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H. Training data-efficient image Transformers & distillation through
attention. In: Proc. of the 38th Int’l Conf. on Machine Learning. PMLR, 2021. 10347–10357.
[26] Wu ZF, Shen CH, van den Hengel A. Wider or deeper: Revisiting the ResNet model for visual recognition. Pattern Recognition, 2019, 90:
119–133. [doi: 10.1016/j.patcog.2019.01.006]
[27] Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A. The PASCAL visual object classes (VOC) challenge. Int’l Journal of
Computer Vision, 2010, 88(2): 303–338. [doi: 10.1007/s11263-009-0275-4]
[28] Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: Common objects in context. In:
Proc. of the 13th European Conf. on Computer Vision. Zurich: Springer, 2014. 740–755. [doi: 10.1007/978-3-319-10602-1_48]
[29] Lin YQ, Chen MH, Wang WX, Wu BX, Li K, Lin BB, Liu HF, He XF. CLIP is also an efficient segmenter: A text-driven approach for
weakly supervised semantic segmentation. In: Proc. of the 2023 IEEE/CVF Conf. on Computer Vision and Pattern Recognition.

