Page 474 - 《软件学报》2025年第9期
P. 474

李军侠 等: 基于语义调制的弱监督语义分割                                                           4385


                  [8]   Chang  YT,  Wang  QS,  Hung  WC,  Piramuthu  R,  Tsai  YH,  Yang  MH.  Weakly-supervised  semantic  segmentation  via  sub-category
                     exploration. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 8988–8997. [doi:
                     10.1109/CVPR42600.2020.00901]
                  [9]   Zhou BL, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In: Proc. of the 2016 IEEE
                     Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 2921–2929. [doi: 10.1109/CVPR.2016.319]
                 [10]   Chen ZW, Wang CA, Wang YB, Jiang GN, Shen YH, Tai Y, Wang CJ, Zhang W, Cao LJ. LCTR: On awakening the local continuity of
                     Transformer for weakly supervised object localization. In: Proc. of the 36th AAAI Conf. on Artificial Intelligence. AAAI Press, 2020.
                     710–718. [doi: 10.1609/aaai.v36i1.19918]
                 [11]   Dosovitskiy  A,  Beyer  L,  Kolesnikov  A,  Weissenborn  D,  Zhai  XH,  Unterthiner  T,  Dehghani  M,  Minderer  M,  Heigold  G,  Gelly  S,
                     Uszkoreit J, Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. In: Proc. of the 9th Int’l Conf. on
                     Learning Representations. OpenReview.net, 2021.
                 [12]   Shi ZN, Chen HP, Zhang D, Shen XJ. Pre-training-driven multimodal boundary-aware vision Transformer. Ruan Jian Xue Bao/Journal of
                     Software, 2023, 34(5): 2051–2067 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6768.htm [doi: 10.13328/j.cnki.
                     jos.006768]
                 [13]   Gao W, Wan F, Pan XJ, Peng ZL, Tian Q, Han ZJ, Zhou BL, Ye QX. TS-CAM: Token semantic coupled attention map for weakly
                     supervised object localization. In: Proc. of the 2021 IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 2866–2875. [doi:
                     10.1109/ICCV48922.2021.00288]
                 [14]   Xu L, Ouyang WL, Bennamoun M, Boussaid F, Xu D. Multi-class token Transformer for weakly supervised semantic segmentation. In:
                     Proc. of the 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 4300–4309. [doi: 10.1109/
                     CVPR52688.2022.00427]
                 [15]   Wei YC, Feng JS, Liang XD, Cheng MM, Zhao Y, Yan SC. Object region mining with adversarial erasing: A simple classification to
                     semantic segmentation approach. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017.
                     6488–6496. [doi: 10.1109/CVPR.2017.687]
                 [16]   Jiang PT, Hou QB, Cao Y, Cheng MM, Wei YC, Xiong HK. Integral object mining via online attention accumulation. In: Proc. of the
                     2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 2070–2079. [doi: 10.1109/ICCV.2019.00216]
                 [17]   Sun GL, Wang WG, Dai JF, van Gool L. Mining cross-image semantics for weakly supervised semantic segmentation. In: Proc. of the
                     16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 347–365. [doi: 10.1007/978-3-030-58536-5_21]
                 [18]   Zhang  F,  Gu  CC,  Zhang  CY,  Dai  YC.  Complementary  patch  for  weakly  supervised  semantic  segmentation.  In:  Proc.  of  the  2021
                     IEEE/CVF Int’l Conf. on Computer Vision. Montreal: IEEE, 2021. 7222–7231. [doi: 10.1109/ICCV48922.2021.00715]
                 [19]   Jiang PT, Yang YQ, Hou QB, Wei YC. L2G: A simple local-to-global knowledge transfer framework for weakly supervised semantic
                     segmentation.  In:  Proc.  of  the  2022  IEEE/CVF  Conf.  on  Computer  Vision  and  Pattern  Recognition.  New  Orleans:  IEEE,  2022.
                     16865–16875. [doi: 10.1109/CVPR52688.2022.01638]
                 [20]   Qin J, Wu J, Xiao XF, Li LJ, Wang XG. Activation modulation and recalibration scheme for weakly supervised semantic segmentation.
                     In: Proc. of the 36th AAAI Conf. on Artificial Intelligence. AAAI Press, 2022. 2117–2125. [doi: 10.1609/aaai.v36i2.20108]
                 [21]   Ahn J, Cho S, Kwak S. Weakly supervised learning of instance segmentation with inter-pixel relations. In: Proc. of the 2019 IEEE/CVF
                     Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 2209–2218. [doi: 10.1109/CVPR.2019.00231]
                 [22]   Ru  LX,  Zhan  YB,  Yu  BS,  Du  B.  Learning  affinity  from  attention:  End-to-end  weakly-supervised  semantic  segmentation  with
                     Transformers.  In:  Proc.  of  the  2022  IEEE/CVF  Conf.  on  Computer  Vision  and  Pattern  Recognition.  New  Orleans:  IEEE,  2022.
                     16825–16834. [doi: 10.1109/CVPR52688.2022.01634]
                 [23]   Ru LX, Zheng HL, Zhan YB, Du B. Token contrast for weakly-supervised semantic segmentation. In: Proc. of the 2023 IEEE/CVF Conf.
                     on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023. 3093–3102. [doi: 10.1109/CVPR52729.2023.00302]
                 [24]   Li RW, Mai ZD, Zhang ZB, Jang J, Sanner S. TransCAM: Transformer attention-based CAM refinement for weakly supervised semantic
                     segmentation. Journal of Visual Communication and Image Representation, 2023, 92: 103800. [doi: 10.1016/j.jvcir.2023.103800]
                 [25]   Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H. Training data-efficient image Transformers & distillation through
                     attention. In: Proc. of the 38th Int’l Conf. on Machine Learning. PMLR, 2021. 10347–10357.
                 [26]   Wu ZF, Shen CH, van den Hengel A. Wider or deeper: Revisiting the ResNet model for visual recognition. Pattern Recognition, 2019, 90:
                     119–133. [doi: 10.1016/j.patcog.2019.01.006]
                 [27]   Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A. The PASCAL visual object classes (VOC) challenge. Int’l Journal of
                     Computer Vision, 2010, 88(2): 303–338. [doi: 10.1007/s11263-009-0275-4]
                 [28]   Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: Common objects in context. In:
                     Proc. of the 13th European Conf. on Computer Vision. Zurich: Springer, 2014. 740–755. [doi: 10.1007/978-3-319-10602-1_48]
                 [29]   Lin YQ, Chen MH, Wang WX, Wu BX, Li K, Lin BB, Liu HF, He XF. CLIP is also an efficient segmenter: A text-driven approach for
                     weakly  supervised  semantic  segmentation.  In:  Proc.  of  the  2023  IEEE/CVF  Conf.  on  Computer  Vision  and  Pattern  Recognition.
   469   470   471   472   473   474   475   476   477   478   479