Page 336 - 《软件学报》2025年第10期

P. 336

张云婷等: 中文对抗攻击下的 ChatGPT 鲁棒性评估 4733

[8] Alzantot M, Sharma Y, Elgohary A, Ho BJ, Srivastava M, Chang KW. Generating natural language adversarial examples. In: Proc. of the
2018 Conf. on Empirical Methods in Natural Language Processing. Brussels: Association for Computational Linguistics, 2018.
2890–2896. [doi: 10.18653/v1/D18-1316]
[9] Zang Y, Qi FC, Yang CH, Liu ZY, Zhang M, Liu Q, Sun MS. Word-level textual adversarial attacking as combinatorial optimization. In:
Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2020.
6066–6080. [doi: 10.18653/v1/2020.acl-main.540]
[10] Jin D, Jin ZJ, Zhou JT, Szolovits P. Is BERT really robust? A strong baseline for natural language attack on text classification and
entailment. In: Proc. of the 34th AAAI Conf. on Artificial Intelligence. New York: AAAI Press, 2020. 8018–8025. [doi: 10.1609/aaai.
v34i05.6311]
[11] Li LY, Ma RT, Guo QP, Xue XY, Qiu XP. BERT-ATTACK: Adversarial attack against BERT using BERT. In: Proc. of the 2020 Conf.
on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2020. 6193–6202. [doi: 10.18653/v1/
2020.emnlp-main.500]
[12] Ren SH, Deng YH, He K, Che WX. Generating natural language adversarial examples through probability weighted word saliency. In:
Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: Association for Computational Linguistics,
2019. 1085–1097. [doi: 10.18653/v1/P19-1103]
[13] Xu JC, Du QF. Adversarial attacks on text classification models using layer-wise relevance propagation. Int’l Journal of Intelligent
Systems, 2020, 35(9): 1397–1415. [doi: 10.1002/int.22260]
[14] Li JF, Ji SL, Du TY, Li B, Wang T. TextBugger: Generating adversarial text against real-world applications. In: Proc. of the 26th Annual
Network and Distributed System Security Symp. San Diego: The Internet Society, 2019. [doi: 10.14722/ndss.2019.23138]
[15] Garg S, Ramakrishnan G. BAE: BERT-based adversarial examples for text classification. In: Proc. of the 2020 Conf. on Empirical
Methods in Natural Language Processing. Association for Computational Linguistics, 2020. 6174–6181. [doi: 10.18653/v1/2020.emnlp-
main.498]
[16] Li DQ, Zhang YZ, Peng H, Chen LQ, Brockett C, Sun MT, Dolan B. Contextualized perturbation for textual adversarial attack. In: Proc.
of the 2021 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
Association for Computational Linguistics, 2021. 5053–5069. [doi: 10.18653/v1/2021.naacl-main.400]
[17] Zhang ZH, Liu MX, Zhang C, Zhang YM, Li Z, Li Q, Duan HX, Sun DH. Argot: Generating adversarial readable Chinese texts. In: Proc.
of the 29th Int’l Joint Conf. on Artificial Intelligence. Yokohama, 2021. 2533–2539. [doi: 10.24963/ijcai.2020/351]
[18] Cheng N, Chang GQ, Gao HC, Pei G, Zhang Y. WordChange: Adversarial examples generation approach for Chinese text classification.
IEEE Access, 2020, 8: 79561–79572. [doi: 10.1109/ACCESS.2020.2988786]
[19] Tong X, Wang LN, Wang RZ, Wang JY. A generation method of word-level adversarial samples for Chinese text classification. Netinfo
Security, 2020, 20(9): 12–16 (in Chinese with English abstract). [doi: 10.3969/j.issn.1671-1122.2020.09.003]
[20] Ou HX, Yu L, Tian SW, Chen X. Chinese adversarial examples generation approach with multi-strategy based on semantic. Knowledge
and Information Systems, 2022, 64(4): 1101–1119. [doi: 10.1007/s10115-022-01652-1]
[21] Zhang YT, Ye L, Tang HL, Zhang HL, Li S. Chinese BERT attack method based on masked language model. Ruan Jian Xue Bao/Journal
of Software, 2024, 35(7): 3392–3409 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6932.htm [doi: 10.13328/j.cnki.
jos.006932]
[22] He XL, Lyu LJ, Sun LC, Xu QK. Model extraction and adversarial transferability, your BERT is vulnerable! In: Proc. of the 2021 Conf.
of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for
Computational Linguistics, 2021. 2006–2012. [doi: 10.18653/v1/2021.naacl-main.161]
[23] Ebrahimi J, Rao AY, Lowd D, Dou DJ. HotFlip: White-box adversarial examples for text classification. In: Proc. of the 56th Annual
Meeting of the Association for Computational Linguistics, Vol. 2: Short Papers. Melbourne: Association for Computational Linguistics,
2018. 31–36. [doi: 10.18653/v1/P18-2006]
[24] Shi YC, Han YH. Metric system and its completeness of adversarial robustness evaluation. Ruan Jian Xue Bao/Journal of Software, 2025,
36(3): 1304–1326 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/7172.htm [doi: 10.13328/j.cnki.jos.007172]
[25] Yoo JY, Morris JX, Lifland E, Qi YJ. Searching for a search method: Benchmarking search algorithms for generating NLP adversarial
examples. In: Proc. of the 3rd BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP. Association for
Computational Linguistics, 2020. 323–332. [doi: 10.18653/v1/2020.blackboxnlp-1.30]
[26] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional Transformers for language understanding. In: Proc.
of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1:
Long and Short Papers. Minneapolis: Association for Computational Linguistics, 2019. 4171–4186. [doi: 10.18653/v1/N19-1423]

331 332 333 334 335 336 337 338 339 340 341