Page 350 - 《软件学报》2025年第7期
P. 350

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
                 2025,36(7):3271−3305 [doi: 10.13328/j.cnki.jos.007364] [CSTR: 32375.14.jos.007364]  http://www.jos.org.cn
                 ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563



                                                                   *
                 面向深度学习的后门攻击及防御研究综述

                 高梦楠  1 ,    陈    伟  1,2 ,    吴礼发  1,2 ,    张伯雷  1


                 1
                  (南京邮电大学 计算机学院、软件学院、网络空间安全学院, 江苏 南京 210023)
                 2
                  (江苏省大数据安全与智能处理重点实验室 (南京邮电大学), 江苏 南京 210023)
                 通信作者: 陈伟, E-mail: chenwei@njupt.edu.cn

                 摘 要: 深度学习模型是人工智能系统的重要组成部分, 被广泛应用于现实多种关键场景. 现有研究表明, 深度学
                 习的低透明度与弱可解释性使得深度学习模型对扰动敏感. 人工智能系统面临多种安全威胁, 其中针对深度学习
                 的后门攻击是人工智能系统面临的重要威胁. 为了提高深度学习模型的安全性, 全面地介绍计算机视觉、自然语
                 言处理等主流深度学习系统的后门攻击与防御研究进展. 首先根据现实中攻击者能力将后门攻击分为全过程可控
                 后门、模型修改后门和仅数据投毒后门. 然后根据后门构建方式进行子类划分. 接着根据防御策略对象将现有后
                 门防御方法分为基于输入的后门防御与基于模型的后门防御. 最后汇总后门攻击常用数据集与评价指标, 并总结
                 后门攻击与防御领域存在的问题, 在后门攻击的安全应用场景与后门防御的有效性等方面提出建议与展望.
                 关键词: 深度学习; 后门攻击; 后门防御; 人工智能安全
                 中图法分类号: TP306

                 中文引用格式: 高梦楠, 陈伟, 吴礼发, 张伯雷. 面向深度学习的后门攻击及防御研究综述. 软件学报, 2025, 36(7): 3271–3305. http://
                 www.jos.org.cn/1000-9825/7364.htm
                 英文引用格式: Gao MN, Chen W, Wu LF, Zhang BL. Survey on Backdoor Attacks and Defenses for Deep Learning Research. Ruan
                 Jian Xue Bao/Journal of Software, 2025, 36(7): 3271–3305 (in Chinese). http://www.jos.org.cn/1000-9825/7364.htm
                 Survey on Backdoor Attacks and Defenses for Deep Learning Research
                             1         1,2      1,2            1
                 GAO Meng-Nan , CHEN Wei , WU Li-Fa , ZHANG Bo-Lei
                 1
                 (School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China)
                 2
                 (Jiangsu  Key  Laboratory  of  Big  Data  Security  &  Intelligent  Processing  (Nanjing  University  of  Posts  and  Telecommunications),  Nanjing
                  210023, China)
                 Abstract:  Deep  learning  models  are  integral  components  of  artificial  intelligence  systems,  widely  deployed  in  various  critical  real-world
                 scenarios.  Research  has  shown  that  the  low  transparency  and  weak  interpretability  of  deep  learning  models  render  them  highly  sensitive  to
                 perturbations.  Consequently,  artificial  intelligence  systems  are  exposed  to  multiple  security  threats,  with  backdoor  attacks  on  deep  learning
                 models representing a significant concern. This study provides a comprehensive overview of the research progress on backdoor attacks and
                 defenses  in  mainstream  deep  learning  systems,  including  computer  vision  and  natural  language  processing.  Backdoor  attacks  are
                 categorized  based  on  the  attacker’s  capabilities  into  full-process  controllable  backdoors,  model  modification  backdoors,  and  data  poisoning
                 backdoors,  which  are  further  classified  according  to  the  backdoor  construction  methods.  Defense  strategies  are  divided  into  input-based
                 defenses  and  model-based  defenses,  depending  on  the  target  of  the  defensive  measures.  This  study  also  summarizes  commonly  used
                 datasets  and  evaluation  metrics  in  this  domain.  Lastly,  existing  challenges  in  backdoor  attack  and  defense  research  are  discussed,  alongside
                 recommendations  and  future  directions  focusing  on  security  application  scenarios  of  backdoor  attacks  and  the  efficacy  of  defense
                 mechanisms.
                 Key words:  deep learning; backdoor attack; backdoor defense; AI security



                 *    基金项目: 国家自然科学基金  (62202238); 江苏省重点研发项目  (BE2022065-5)
                  收稿时间: 2024-04-27; 修改时间: 2024-07-15, 2024-09-05; 采用时间: 2024-11-26; jos 在线出版时间: 2025-04-25
                  CNKI 网络首发时间: 2025-04-27
   345   346   347   348   349   350   351   352   353   354   355