Page 81 - 《软件学报》2025年第4期
P. 81
孙伟松 等: 深度代码模型安全综述 1487
[68] Qi FC, Chen YY, Li MK, Yao Y, Liu ZY, Sun MS. ONION: A simple and effective defense against textual backdoor attacks. In: Proc. of
the 2021 Conf. on Empirical Methods in Natural Language Processing. Punta Cana: Association for Computational Linguistics, 2021.
9558–9566. [doi: 10.18653/v1/2021.emnlp-main.752]
[69] Du XH, Wen M, Wei ZH, Wang SW, Jin H. An extensive study on adversarial attack against pre-trained models of code. In: Proc. of the
31st ACM Joint European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. San Francisco: ACM,
2023. 489–501. [doi: 10.1145/3611643.3616356]
[70] Li YJ, Tarlow D, Brockschmidt M, Zemel R. Gated graph sequence neural networks. arXiv:1511.05493, 2017.
[71] Alon U, Zilberstein M, Levy O, Yahav E. Code2Vec: Learning distributed representations of code. Proc. of the ACM on Programming
Languages, 2019, 3(POPL): 40. [doi: 10.1145/3290353]
[72] Brockschmidt M, Allamanis M, Gaunt AL, Polozov O. Generative code modeling with graphs. arXiv:1805.08490, 2019.
[73] Huo X, Li M. Enhancing the unified features to locate buggy files by exploiting the sequential nature of source code. In: Proc. of the 26th
Int’l Joint Conf. on Artificial Intelligence. Melbourne: IJCAI.org, 2017. 1909–1915. [doi: 10.24963/ijcai.2017/265]
[74] Mou LL, Li G, Zhang L, Wang T, Jin Z. Convolutional neural networks over tree structures for programming language processing. In:
Proc. of the 30th AAAI Conf. on Artificial Intelligence. Phoenix: AAAI Press, 2016. 1287–1293.
[75] Wei HH, Li M. Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in
source code. In: Proc. of the 26th Int’l Joint Conf. on Artificial Intelligence. Melbourne: AAAI Press, 2017. 3034–3040.
challenge competence with APPS. arXiv:2105.09938, 2021.
[76] Alzantot M, Sharma Y, Elgohary A, Ho BJ, Srivastava M, Chang KW. Generating natural language adversarial examples. In: Proc. of the
2018 Conf. on Empirical Methods in Natural Language Processing. Brussels: Association for Computational Linguistics, 2018.
2890–2896. [doi: 10.18653/v1/D18-1316]
[77] Jin D, Jin ZJ, Zhou JT, Szolovits P. Is BERT really robust? A strong baseline for natural language attack on text classification and
entailment. In: Proc. of the 34th AAAI Conf. on Artificial Intelligence. New York: AAAI Press, 2020. 8018–8025. [doi: 10.1609/aaai.
v34i05.6311]
[78] Li LY, Ma RT, Guo QP, Xue XY, Qiu XP. BERT-Attack: Adversarial attack against BERT using BERT. In: Proc. of the 2020 Conf. on
Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2020. 6193–6202. [doi: 10.18653/v1/
2020.emnlp-main.500]
[79] Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M,
Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D. Mastering
the game of go with deep neural networks and tree search. Nature, 2016, 529(7587): 484–489. [doi: 10.1038/nature16961]
[80] Caliskan-Islam A, Harang R, Liu A, Narayanan A, Voss C, Yamaguchi F, Greenstadt R. De-anonymizing programmers via code
stylometry. In: Proc. of the 24th USENIX Conf. on Security Symp. Washington: USENIX Association, 2015. 255–270.
[81] Abuhamad M, AbuHmed T, Mohaisen A, Nyang D. Large-scale and language-oblivious code authorship identification. In: Proc. of the
2018 ACM SIGSAC Conf. on Computer and Communications Security. Toronto: ACM, 2018. 101–114. [doi: 10.1145/3243734.
3243738]
[82] Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083,
2019.
[83] Lu S, Guo DY, Ren S, Huang JJ, Svyatkovskiy A, Blanco A, Clement C, Drain D, Jiang DX, Tang DY, Li G, Zhou LD, Shou LJ, Zhou L,
Tufano M, Gong M, Zhou M, Duan N, Sundaresan N, Deng SK, Fu SY, Liu SJ. CodeXGLUE: A machine learning benchmark dataset for
code understanding and generation. arXiv:2102.04664, 2021.
[84] Liu CX, Wan XJ. CodeQA: A question answering dataset for source code comprehension. In: Proc. of the 2021 Findings of the
Association for Computational Linguistics. Punta Cana: Association for Computational Linguistics, 2021. 2618–2632. [doi: 10.18653/v1/
2021.findings-emnlp.223]
[85] Hendrycks D, Basart S, Kadavath S, Mazeika M, Arora A, Guo E, Burns C, Puranik S, He H, Song D, Steinhardt J. Measuring coding
[86] Liguori P, Al-Hossami E, Cotroneo D, Natella R, Cukic B, Shaikh S. Shellcode_IA32: A dataset for automatic shellcode generation.
arXiv:2104.13100, 2022.
[87] Siddiq ML, Santos JCS. SecurityEval dataset: Mining vulnerability examples to evaluate machine learning-based code generation
techniques. In: Proc. of the 1st Int’l Workshop on Mining Software Repositories Applications for Privacy and Security. Singapore: ACM:
2022. 29–33. [doi: 10.1145/3549035.3561184]
[88] Tony C, Mutas M, Ferreyra NED, Scandariato R. LLMSecEval: A dataset of natural language prompts for security evaluations. In: Proc.
of the 20th IEEE/ACM Int’l Conf. on Mining Software Repositories. Melbourne: IEEE, 2023. 588–592. [doi: 10.1109/MSR59073.
2023.00084]