Page 78 - 《软件学报》2025年第4期
P. 78

1484                                                       软件学报  2025  年第  36  卷第  4  期


                 代码模型在后门攻击和对抗攻击方面的安全威胁. 本文将后门攻击分为数据投毒攻击和模型投毒攻击, 将对抗攻
                 击分为白盒对抗攻击和黑盒对抗攻击, 并对针对不同分类深度代码模型攻击的研究工作进行了详尽的总结和分
                 析. 随后, 本文揭示了后门攻击和对抗攻击所对应的防御缺失. 此外, 对当前该领域研究面临的挑战进行了深入分
                 析, 为进一步研究提供了有益的指导. 最后, 对深度代码模型安全领域常用的数据集和常用评估指标进行了整理和
                 概括, 以便读者便利地使用.

                 References:
                  [1]  Iyer S, Konstas I, Cheung A, Zettlemoyer L. Summarizing source code using a neural attention model. In: Proc. of the 54th Annual
                     Meeting  of  the  Association  for  Computational  Linguistics  (Vol.  1:  Long  Papers).  Berlin:  Association  for  Computational  Linguistics,
                     2016. 2073–2083. [doi: 10.18653/v1/P16-1195]
                  [2]  Gu XD, Zhang HY, Kim S. Deep code search. In: Proc. of the 40th Int’l Conf. on Software Engineering. Gothenburg: ACM, 2018.
                     933–944. [doi: 10.1145/3180155.3180167]
                  [3]  Zhang J, Wang X, Zhang HY, Sun HL, Wang KX, Liu XD. A novel neural source code representation based on abstract syntax tree. In:
                     Proc. of the 41st IEEE/ACM Int’l Conf. on Software Engineering. Montreal: IEEE, 2019. 783–794. [doi: 10.1109/ICSE.2019.00086]

                  [4]  Fang CR, Liu ZX, Shi YY, Huang J, Shi QK. Functional code clone detection with syntax and semantics fusion learning. In: Proc. of the
                     29th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. ACM, 2020. 516–527. [doi: 10.1145/3395363.3397362]
                  [5]  Lv F, Zhang HY, Lou JG, Wang SW, Zhang DM, Zhao JJ. CodeHow: Effective code search based on API understanding and extended
                     boolean model (E). In: Proc. of the 30th IEEE/ACM Int’l Conf. on Automated Software Engineering. Lincoln: IEEE, 2015. 260–270.
                     [doi: 10.1109/ASE.2015.42]
                  [6]  Feng ZY, Guo DY, Tang DY, Duan N, Feng XC, Gong M, Shou LJ, Qin B, Liu T, Jiang DX, Zhou M. CodeBERT: A pre-trained model
                     for programming and natural languages. In: Proc. of the 2020 Findings of the Association for Computational Linguistics. Association for
                     Computational Linguistics, 2020. 1536–1547. [doi: 10.18653/v1/2020.findings-emnlp.139]
                  [7]  Chen M, Tworek J, Jun H, et al. Evaluating large language models trained on code. arXiv:2107.03374, 2021.
                  [8]  Schuster R, Song CZ, Tromer E, Shmatikov V. You autocomplete me: Poisoning vulnerabilities in neural code completion. In: Proc. of
                     the 30th USENIX Security Symp. Vancouver: USENIX Association, 2021. 1559–1575.
                  [9]  Gao FJ, Wang Y, Wang K. Discrete adversarial attack to models of code. Proc. of the ACM on Programming Languages, 2023, 7(PLDI):
                     172–195. [doi: 10.1145/3591227]
                 [10]  Wan Y, Zhang SJ, Zhang HY, Sui Y, Xu GD, Yao DZ, Jin H, Sun LC. You see what I want you to see: Poisoning vulnerabilities in neural
                     code  search.  In:  Proc.  of  the  30th  ACM  Joint  European  Software  Engineering  Conf.  and  Symp.  on  the  Foundations  of  Software
                     Engineering. Singapore: ACM, 2022. 1233–1245. [doi: 10.1145/3540250.3549153]
                 [11]  Sun WS, Chen YC, Tao GH, Fang CR, Zhang XY, Zhang QJ, Luo B. Backdooring neural code search. In: Proc. of the 61st Annual
                     Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers). Toronto: Association for Computational Linguistics,
                     2023. 9692–9708. [doi: 10.18653/v1/2023.acl-long.540]
                 [12]  Na CW, Choi YS, Lee JH. DIP: Dead code insertion based black-box attack for programming language model. In: Proc. of the 61st
                     Annual  Meeting  of  the  Association  for  Computational  Linguistics  (Vol.  1:  Long  Papers).  Toronto:  Association  for  Computational
                     Linguistics, 2023. 7777–7791. [doi: 10.18653/v1/2023.acl-long.430]
                 [13]  Lin GJ, Wen S, Han QL, Zhang J, Xiang Y. Software vulnerability detection using deep neural networks: A survey. Proc. of the IEEE,
                     2020, 108(10): 1825–1848. [doi: 10.1109/JPROC.2020.2993293]
                 [14]  Jiang  JJ,  Chen  JJ,  Xiong  YF.  Survey  of  automatic  program  repair  techniques.  Ruan  Jian  Xue  Bao/Journal  of  Software,  2021,  32(9):
                     2665−2690 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6274.htm [doi: 10.13328/j.cnki.jos.006274]
                 [15]  Yang Z, Shi JK, He JD, Lo D. Natural attack for pre-trained models of code. In: Proc. of the 44th IEEE/ACM Int’l Conf. on Software
                     Engineering. Pittsburgh: ACM, 2022. 1482–1493. [doi: 10.1145/3510003.3510146]
                 [16]  Li  Z,  Chen  GQ,  Chen  C,  Zou  YY,  Xu  SH.  RoPGen:  Towards  robust  code  authorship  attribution  via  automatic  coding  style
                     transformation. In: Proc. of the 44th IEEE/ACM Int’l Conf. on Software Engineering. Pittsburgh: ACM, 2022. 1906–1918.
                 [17]  Yefet N, Alon U, Yahav E. Adversarial examples for models of code. Proc. of the ACM on Programming Languages, 2020, 4(OOPSLA):
                     162. [doi: 10.1145/3428230]
                 [18]  Pour MV, Li Z, Ma L, Hemmati H. A search-based testing framework for deep neural networks of source code embedding. In: Proc. of
                     the  14th  IEEE  Conf.  on  Software  Testing,  Verification  and  Validation.  Porto  de  Galinhas:  IEEE,  2021.  36–46.  [doi: 10.1109/
                     ICST49551.2021.00016]
   73   74   75   76   77   78   79   80   81   82   83