Page 81 - 《软件学报》2025年第4期
P. 81

孙伟松 等: 深度代码模型安全综述                                                               1487


                 [68]  Qi FC, Chen YY, Li MK, Yao Y, Liu ZY, Sun MS. ONION: A simple and effective defense against textual backdoor attacks. In: Proc. of
                     the 2021 Conf. on Empirical Methods in Natural Language Processing. Punta Cana: Association for Computational Linguistics, 2021.
                     9558–9566. [doi: 10.18653/v1/2021.emnlp-main.752]
                 [69]  Du XH, Wen M, Wei ZH, Wang SW, Jin H. An extensive study on adversarial attack against pre-trained models of code. In: Proc. of the
                     31st ACM Joint European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. San Francisco: ACM,
                     2023. 489–501. [doi: 10.1145/3611643.3616356]
                 [70]  Li YJ, Tarlow D, Brockschmidt M, Zemel R. Gated graph sequence neural networks. arXiv:1511.05493, 2017.
                 [71]  Alon U, Zilberstein M, Levy O, Yahav E. Code2Vec: Learning distributed representations of code. Proc. of the ACM on Programming
                     Languages, 2019, 3(POPL): 40. [doi: 10.1145/3290353]
                 [72]  Brockschmidt M, Allamanis M, Gaunt AL, Polozov O. Generative code modeling with graphs. arXiv:1805.08490, 2019.
                 [73]  Huo X, Li M. Enhancing the unified features to locate buggy files by exploiting the sequential nature of source code. In: Proc. of the 26th
                     Int’l Joint Conf. on Artificial Intelligence. Melbourne: IJCAI.org, 2017. 1909–1915. [doi: 10.24963/ijcai.2017/265]
                 [74]  Mou LL, Li G, Zhang L, Wang T, Jin Z. Convolutional neural networks over tree structures for programming language processing. In:
                     Proc. of the 30th AAAI Conf. on Artificial Intelligence. Phoenix: AAAI Press, 2016. 1287–1293.
                 [75]  Wei HH, Li M. Supervised deep features for software functional clone detection by exploiting lexical and syntactical information in
                     source code. In: Proc. of the 26th Int’l Joint Conf. on Artificial Intelligence. Melbourne: AAAI Press, 2017. 3034–3040.
                     challenge competence with APPS. arXiv:2105.09938, 2021.
                 [76]  Alzantot M, Sharma Y, Elgohary A, Ho BJ, Srivastava M, Chang KW. Generating natural language adversarial examples. In: Proc. of the
                     2018  Conf.  on  Empirical  Methods  in  Natural  Language  Processing.  Brussels:  Association  for  Computational  Linguistics,  2018.
                     2890–2896. [doi: 10.18653/v1/D18-1316]
                 [77]  Jin  D,  Jin  ZJ,  Zhou  JT,  Szolovits  P.  Is  BERT  really  robust?  A  strong  baseline  for  natural  language  attack  on  text  classification  and
                     entailment. In: Proc. of the 34th AAAI Conf. on Artificial Intelligence. New York: AAAI Press, 2020. 8018–8025. [doi: 10.1609/aaai.
                     v34i05.6311]
                 [78]  Li LY, Ma RT, Guo QP, Xue XY, Qiu XP. BERT-Attack: Adversarial attack against BERT using BERT. In: Proc. of the 2020 Conf. on
                     Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2020. 6193–6202. [doi: 10.18653/v1/
                     2020.emnlp-main.500]
                 [79]  Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M,
                     Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D. Mastering
                     the game of go with deep neural networks and tree search. Nature, 2016, 529(7587): 484–489. [doi: 10.1038/nature16961]
                 [80]  Caliskan-Islam  A,  Harang  R,  Liu  A,  Narayanan  A,  Voss  C,  Yamaguchi  F,  Greenstadt  R.  De-anonymizing  programmers  via  code
                     stylometry. In: Proc. of the 24th USENIX Conf. on Security Symp. Washington: USENIX Association, 2015. 255–270.
                 [81]  Abuhamad M, AbuHmed T, Mohaisen A, Nyang D. Large-scale and language-oblivious code authorship identification. In: Proc. of the
                     2018  ACM  SIGSAC  Conf.  on  Computer  and  Communications  Security.  Toronto:  ACM,  2018.  101–114.  [doi:  10.1145/3243734.
                     3243738]
                 [82]  Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A. Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083,
                     2019.
                 [83]  Lu S, Guo DY, Ren S, Huang JJ, Svyatkovskiy A, Blanco A, Clement C, Drain D, Jiang DX, Tang DY, Li G, Zhou LD, Shou LJ, Zhou L,
                     Tufano M, Gong M, Zhou M, Duan N, Sundaresan N, Deng SK, Fu SY, Liu SJ. CodeXGLUE: A machine learning benchmark dataset for
                     code understanding and generation. arXiv:2102.04664, 2021.
                 [84]  Liu  CX,  Wan  XJ.  CodeQA:  A  question  answering  dataset  for  source  code  comprehension.  In:  Proc.  of  the  2021  Findings  of  the
                     Association for Computational Linguistics. Punta Cana: Association for Computational Linguistics, 2021. 2618–2632. [doi: 10.18653/v1/
                     2021.findings-emnlp.223]
                 [85]  Hendrycks D, Basart S, Kadavath S, Mazeika M, Arora A, Guo E, Burns C, Puranik S, He H, Song D, Steinhardt J. Measuring coding

                 [86]  Liguori P, Al-Hossami E, Cotroneo D, Natella R, Cukic B, Shaikh S. Shellcode_IA32: A dataset for automatic shellcode generation.
                     arXiv:2104.13100, 2022.
                 [87]  Siddiq  ML,  Santos  JCS.  SecurityEval  dataset:  Mining  vulnerability  examples  to  evaluate  machine  learning-based  code  generation
                     techniques. In: Proc. of the 1st Int’l Workshop on Mining Software Repositories Applications for Privacy and Security. Singapore: ACM:
                     2022. 29–33. [doi: 10.1145/3549035.3561184]
                 [88]  Tony C, Mutas M, Ferreyra NED, Scandariato R. LLMSecEval: A dataset of natural language prompts for security evaluations. In: Proc.
                     of  the  20th  IEEE/ACM  Int’l  Conf.  on  Mining  Software  Repositories.  Melbourne:  IEEE,  2023.  588–592.  [doi:  10.1109/MSR59073.
                     2023.00084]
   76   77   78   79   80   81   82   83   84   85   86