Page 350 - 《软件学报》2025年第7期
P. 350
软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
2025,36(7):3271−3305 [doi: 10.13328/j.cnki.jos.007364] [CSTR: 32375.14.jos.007364] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563
*
面向深度学习的后门攻击及防御研究综述
高梦楠 1 , 陈 伟 1,2 , 吴礼发 1,2 , 张伯雷 1
1
(南京邮电大学 计算机学院、软件学院、网络空间安全学院, 江苏 南京 210023)
2
(江苏省大数据安全与智能处理重点实验室 (南京邮电大学), 江苏 南京 210023)
通信作者: 陈伟, E-mail: chenwei@njupt.edu.cn
摘 要: 深度学习模型是人工智能系统的重要组成部分, 被广泛应用于现实多种关键场景. 现有研究表明, 深度学
习的低透明度与弱可解释性使得深度学习模型对扰动敏感. 人工智能系统面临多种安全威胁, 其中针对深度学习
的后门攻击是人工智能系统面临的重要威胁. 为了提高深度学习模型的安全性, 全面地介绍计算机视觉、自然语
言处理等主流深度学习系统的后门攻击与防御研究进展. 首先根据现实中攻击者能力将后门攻击分为全过程可控
后门、模型修改后门和仅数据投毒后门. 然后根据后门构建方式进行子类划分. 接着根据防御策略对象将现有后
门防御方法分为基于输入的后门防御与基于模型的后门防御. 最后汇总后门攻击常用数据集与评价指标, 并总结
后门攻击与防御领域存在的问题, 在后门攻击的安全应用场景与后门防御的有效性等方面提出建议与展望.
关键词: 深度学习; 后门攻击; 后门防御; 人工智能安全
中图法分类号: TP306
中文引用格式: 高梦楠, 陈伟, 吴礼发, 张伯雷. 面向深度学习的后门攻击及防御研究综述. 软件学报, 2025, 36(7): 3271–3305. http://
www.jos.org.cn/1000-9825/7364.htm
英文引用格式: Gao MN, Chen W, Wu LF, Zhang BL. Survey on Backdoor Attacks and Defenses for Deep Learning Research. Ruan
Jian Xue Bao/Journal of Software, 2025, 36(7): 3271–3305 (in Chinese). http://www.jos.org.cn/1000-9825/7364.htm
Survey on Backdoor Attacks and Defenses for Deep Learning Research
1 1,2 1,2 1
GAO Meng-Nan , CHEN Wei , WU Li-Fa , ZHANG Bo-Lei
1
(School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China)
2
(Jiangsu Key Laboratory of Big Data Security & Intelligent Processing (Nanjing University of Posts and Telecommunications), Nanjing
210023, China)
Abstract: Deep learning models are integral components of artificial intelligence systems, widely deployed in various critical real-world
scenarios. Research has shown that the low transparency and weak interpretability of deep learning models render them highly sensitive to
perturbations. Consequently, artificial intelligence systems are exposed to multiple security threats, with backdoor attacks on deep learning
models representing a significant concern. This study provides a comprehensive overview of the research progress on backdoor attacks and
defenses in mainstream deep learning systems, including computer vision and natural language processing. Backdoor attacks are
categorized based on the attacker’s capabilities into full-process controllable backdoors, model modification backdoors, and data poisoning
backdoors, which are further classified according to the backdoor construction methods. Defense strategies are divided into input-based
defenses and model-based defenses, depending on the target of the defensive measures. This study also summarizes commonly used
datasets and evaluation metrics in this domain. Lastly, existing challenges in backdoor attack and defense research are discussed, alongside
recommendations and future directions focusing on security application scenarios of backdoor attacks and the efficacy of defense
mechanisms.
Key words: deep learning; backdoor attack; backdoor defense; AI security
* 基金项目: 国家自然科学基金 (62202238); 江苏省重点研发项目 (BE2022065-5)
收稿时间: 2024-04-27; 修改时间: 2024-07-15, 2024-09-05; 采用时间: 2024-11-26; jos 在线出版时间: 2025-04-25
CNKI 网络首发时间: 2025-04-27

