Page 78 - 《软件学报》2025年第4期
P. 78
1484 软件学报 2025 年第 36 卷第 4 期
代码模型在后门攻击和对抗攻击方面的安全威胁. 本文将后门攻击分为数据投毒攻击和模型投毒攻击, 将对抗攻
击分为白盒对抗攻击和黑盒对抗攻击, 并对针对不同分类深度代码模型攻击的研究工作进行了详尽的总结和分
析. 随后, 本文揭示了后门攻击和对抗攻击所对应的防御缺失. 此外, 对当前该领域研究面临的挑战进行了深入分
析, 为进一步研究提供了有益的指导. 最后, 对深度代码模型安全领域常用的数据集和常用评估指标进行了整理和
概括, 以便读者便利地使用.
References:
[1] Iyer S, Konstas I, Cheung A, Zettlemoyer L. Summarizing source code using a neural attention model. In: Proc. of the 54th Annual
Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers). Berlin: Association for Computational Linguistics,
2016. 2073–2083. [doi: 10.18653/v1/P16-1195]
[2] Gu XD, Zhang HY, Kim S. Deep code search. In: Proc. of the 40th Int’l Conf. on Software Engineering. Gothenburg: ACM, 2018.
933–944. [doi: 10.1145/3180155.3180167]
[3] Zhang J, Wang X, Zhang HY, Sun HL, Wang KX, Liu XD. A novel neural source code representation based on abstract syntax tree. In:
Proc. of the 41st IEEE/ACM Int’l Conf. on Software Engineering. Montreal: IEEE, 2019. 783–794. [doi: 10.1109/ICSE.2019.00086]
[4] Fang CR, Liu ZX, Shi YY, Huang J, Shi QK. Functional code clone detection with syntax and semantics fusion learning. In: Proc. of the
29th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. ACM, 2020. 516–527. [doi: 10.1145/3395363.3397362]
[5] Lv F, Zhang HY, Lou JG, Wang SW, Zhang DM, Zhao JJ. CodeHow: Effective code search based on API understanding and extended
boolean model (E). In: Proc. of the 30th IEEE/ACM Int’l Conf. on Automated Software Engineering. Lincoln: IEEE, 2015. 260–270.
[doi: 10.1109/ASE.2015.42]
[6] Feng ZY, Guo DY, Tang DY, Duan N, Feng XC, Gong M, Shou LJ, Qin B, Liu T, Jiang DX, Zhou M. CodeBERT: A pre-trained model
for programming and natural languages. In: Proc. of the 2020 Findings of the Association for Computational Linguistics. Association for
Computational Linguistics, 2020. 1536–1547. [doi: 10.18653/v1/2020.findings-emnlp.139]
[7] Chen M, Tworek J, Jun H, et al. Evaluating large language models trained on code. arXiv:2107.03374, 2021.
[8] Schuster R, Song CZ, Tromer E, Shmatikov V. You autocomplete me: Poisoning vulnerabilities in neural code completion. In: Proc. of
the 30th USENIX Security Symp. Vancouver: USENIX Association, 2021. 1559–1575.
[9] Gao FJ, Wang Y, Wang K. Discrete adversarial attack to models of code. Proc. of the ACM on Programming Languages, 2023, 7(PLDI):
172–195. [doi: 10.1145/3591227]
[10] Wan Y, Zhang SJ, Zhang HY, Sui Y, Xu GD, Yao DZ, Jin H, Sun LC. You see what I want you to see: Poisoning vulnerabilities in neural
code search. In: Proc. of the 30th ACM Joint European Software Engineering Conf. and Symp. on the Foundations of Software
Engineering. Singapore: ACM, 2022. 1233–1245. [doi: 10.1145/3540250.3549153]
[11] Sun WS, Chen YC, Tao GH, Fang CR, Zhang XY, Zhang QJ, Luo B. Backdooring neural code search. In: Proc. of the 61st Annual
Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers). Toronto: Association for Computational Linguistics,
2023. 9692–9708. [doi: 10.18653/v1/2023.acl-long.540]
[12] Na CW, Choi YS, Lee JH. DIP: Dead code insertion based black-box attack for programming language model. In: Proc. of the 61st
Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers). Toronto: Association for Computational
Linguistics, 2023. 7777–7791. [doi: 10.18653/v1/2023.acl-long.430]
[13] Lin GJ, Wen S, Han QL, Zhang J, Xiang Y. Software vulnerability detection using deep neural networks: A survey. Proc. of the IEEE,
2020, 108(10): 1825–1848. [doi: 10.1109/JPROC.2020.2993293]
[14] Jiang JJ, Chen JJ, Xiong YF. Survey of automatic program repair techniques. Ruan Jian Xue Bao/Journal of Software, 2021, 32(9):
2665−2690 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6274.htm [doi: 10.13328/j.cnki.jos.006274]
[15] Yang Z, Shi JK, He JD, Lo D. Natural attack for pre-trained models of code. In: Proc. of the 44th IEEE/ACM Int’l Conf. on Software
Engineering. Pittsburgh: ACM, 2022. 1482–1493. [doi: 10.1145/3510003.3510146]
[16] Li Z, Chen GQ, Chen C, Zou YY, Xu SH. RoPGen: Towards robust code authorship attribution via automatic coding style
transformation. In: Proc. of the 44th IEEE/ACM Int’l Conf. on Software Engineering. Pittsburgh: ACM, 2022. 1906–1918.
[17] Yefet N, Alon U, Yahav E. Adversarial examples for models of code. Proc. of the ACM on Programming Languages, 2020, 4(OOPSLA):
162. [doi: 10.1145/3428230]
[18] Pour MV, Li Z, Ma L, Hemmati H. A search-based testing framework for deep neural networks of source code embedding. In: Proc. of
the 14th IEEE Conf. on Software Testing, Verification and Validation. Porto de Galinhas: IEEE, 2021. 36–46. [doi: 10.1109/
ICST49551.2021.00016]