Page 80 - 《软件学报》2025年第4期
P. 80
1486 软件学报 2025 年第 36 卷第 4 期
[44] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. In: Proc. of the
31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.
[45] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional Transformers for language understanding. In: Proc.
of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1
(Long and Short Papers). Minneapolis: Association for Computational Linguistics, 2019. 4171–4186. [doi: 10.18653/v1/N19-1423]
[46] Husain H, Wu HH, Gazit T, Allamanis M, Brockschmidt M. CodeSearchNet challenge: Evaluating the state of semantic code search.
arXiv:1909.09436, 2020.
[47] Svajlenko J, Islam JF, Keivanloo I, Roy CK, Mia MM. Towards a big data curated benchmark of inter-project code clones. In: Proc. of
the 2014 IEEE Int’l Conf. on Software Maintenance and Evolution. Victoria: IEEE, 2014. 476–480. [doi: 10.1109/ICSME.2014.77]
[48] Zhou YQ, Liu SQ, Siow JK, Du XN, Liu Y. Devign: Effective vulnerability identification by learning comprehensive program semantics
via graph neural networks. In: Proc. of the 33rd Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc.,
2019. 10197–10207.
[49] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780. [doi: 10.1162/neco.1997.9.8.1735]
[50] Alon U, Brody S, Levy O, Yahav E. Code2Seq: Generating sequences from structured representations of code. In: Proc. of the 7th Int’l
Conf. on Learning Representations. New Orleans: OpenReview.net, 2019.
[51] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473, 2016.
[52] Svyatkovskiy A, Zhao Y, Fu SY, Sundaresan N. Pythia: AI-assisted code completion system. In: Proc. of the 25th ACM SIGKDD Int’l
Conf. on Knowledge Discovery & Data Mining. Anchorage: ACM, 2019. 2727–2735. [doi: 10.1145/3292500.3330699]
[53] Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I. Language models are unsupervised multitask learners. OpenAI Blog, 2019,
1(8): 9.
[54] Tran B, Li J, Madry A. Spectral signatures in backdoor attacks. In: Proc. of the 32nd Int’l Conf. on Neural Information Processing
Systems. Montréal: Curran Associates Inc., 2018. 8011–8021.
[55] Chen B, Carvalho W, Baracaldo N, Ludwig H, Edwards B, Lee T, Molloy IM, Srivastava B. Detecting backdoor attacks on deep neural
networks by activation clustering. In: Proc. of the 2019 Workshop on Artificial Intelligence Safety Co-located with the 33rd AAAI Conf.
on Artificial Intelligence. Honolulu: CEUR-WS.org, 2019. 2301.
[56] Cho K, van Merriënboer B, Bahdanau D, Bengio Y. On the properties of neural machine translation: Encoder-decoder approaches. In:
Proc. of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. Doha: Association for Computational
Linguistics, 2014. 103–111. [doi: 10.3115/v1/W14-4012]
[57] Gu TY, Dolan-Gavitt B, Garg S. BadNets: Identifying vulnerabilities in the machine learning model supply chain. arXiv:1708.06733,
2019.
[58] Kim Y. Convolutional neural networks for sentence classification. In: Proc. of the 2014 Conf. on Empirical Methods in Natural Language
Processing. Doha: Association for Computational Linguistics, 2014. 1746–1751. [doi: 10.3115/v1/D14-1181]
[59] Wang Y, Le H, Gotmare AD, Bui NDQ, Li JN, Hoi SCH. CodeT5+: Open code large language models for code understanding and
generation. arXiv:2305.07922, 2023.
[60] Anderson HS, Roth P. EMBER: An open dataset for training static PE malware machine learning models. arXiv:1804.04637, 2018.
[61] Smutz C, Stavrou A. Malicious PDF detection using metadata and structural features. In: Proc. of the 28th Annual Computer Security
Applications Conf. Orlando: ACM, 2012. 239–248. [doi: 10.1145/2420950.2420987]
[62] Arp D, Spreitzenbarth M, Hubner M, Gascon H, Rieck K. Drebin: Effective and explainable detection of Android malware in your
pocket. In: Proc. of the 21st Annual Network and Distributed System Security Symp. San Diego: The Internet Society, 2014.
[63] Wang Y, Wang WS, Joty S, Hoi SCH. CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and
generation. In: Proc. of the 2021 Conf. on Empirical Methods in Natural Language Processing. Punta Cana: Association for
Computational Linguistics, 2021. 8696–8708. [doi: 10.18653/v1/2021.emnlp-main.685]
[64] Junod P, Rinaldini J, Wehrli J, Michielin J. Obfuscator-LLVM—Software protection for the masses. In: Proc. of the 1st IEEE/ACM Int’l
Workshop on Software Protection. Florence: IEEE, 2015. 3–9. [doi: 10.1109/SPRO.2015.10]
[65] Chen CS, Dai JZ. Mitigating backdoor attacks in LSTM-based text classification systems by backdoor keyword identification.
Neurocomputing, 2021, 452: 253–262. [doi: 10.1016/j.neucom.2021.04.105]
[66] Guo DY, Ren S, Lu S, Feng ZY, Tang DY, Liu SJ, Zhou L, Duan N, Svyatkovskiy A, Fu SY, Tufano M, Deng SK, Clement C, Drain D,
Sundaresan N, Yin J, Jiang DX, Zhou M. GraphCodeBERT: Pre-training code representations with data flow. In: Proc. of the 9th Int’l
Conf. on Learning Representations. OpenReview.net, 2021.
[67] Sundararajan M, Taly A, Yan QQ. Axiomatic attribution for deep networks. In: Proc. of the 34th Int’l Conf. on Machine Learning.
Sydney: JMLR.org, 2017. 3319–3328.