Page 467 - 《软件学报》2025年第10期

P. 467

软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
2025,36(10):4864−4879 [doi: 10.13328/j.cnki.jos.007300] [CSTR: 32375.14.jos.007300] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563

*
基于图像变换的双阈值对抗样本检测

刘会 1,2 , 文福举 1,2 , 杜红琴 1,2 , 王敬华 1,2 , 赵波 3

1
(华中师范大学计算机学院, 湖北武汉 430079)
(人工智能与智慧学习湖北省重点实验室 (华中师范大学), 湖北武汉 430079)
2
3
(武汉大学国家网络安全学院, 湖北武汉 430072)
通信作者: 赵波, E-mail: zhaobo@whu.edu.cn

摘要: 当前基于图像变换的对抗样本检测方法利用了图像变换对对抗样本的特征分布造成较大的影响, 而对于
良性样本的特征分布影响较小这一特点, 通过计算样本变换前后的特征距离来检测对抗样本. 然而随着对抗攻击
的深入研究, 研究者们更注重加强对抗攻击的鲁棒性, 使得一些攻击能“免疫”图像变换带来的影响. 现有方法难以
有效地检测出鲁棒性强的对抗样本. 发现当前的对抗样本过于鲁棒, 强鲁棒性对抗样本在图像变换下的特征分布
距离远小于良性样本的特征分布距离, 其特征分布距离违背了良性样本特征分布规律. 基于这一关键的发现, 提出
基于图像变换的双阈值对抗样本检测方法, 在传统单阈值检测方法的基础上设置一个下阈值, 构成双阈值检测区
间, 其特征分布距离不在区间范围的样本将被判定为对抗样本. 在 VGG19、DenseNet 和 ConvNeXt 图像分类模型
中开展广泛的验证. 实验证明该方法能够有效兼容现有单阈值检测方案的检测能力, 同时对强鲁棒性对抗样本表
现出良好的检测效果.
关键词: 图像变换; 对抗样本; 特征分布; 双阈值检测; 图像分类
中图法分类号: TP309

中文引用格式: 刘会, 文福举, 杜红琴, 王敬华, 赵波. 基于图像变换的双阈值对抗样本检测. 软件学报, 2025, 36(10): 4864–4879.
http://www.jos.org.cn/1000-9825/7300.htm
英文引用格式: Liu H, Wen FJ, Du HQ, Wang JH, Zhao B. Dual-threshold Adversarial Example Detection Based on Image Transformation.
Ruan Jian Xue Bao/Journal of Software, 2025, 36(10): 4864–4879 (in Chinese). http://www.jos.org.cn/1000-9825/7300.htm

Dual-threshold Adversarial Example Detection Based on Image Transformation
1,2
1,2
1,2
1,2
LIU Hui , WEN Fu-Ju , DU Hong-Qin , WANG Jing-Hua , ZHAO Bo 3
1
(School of Computer Science, Central China Normal University, Wuhan 430079, China)
2
(Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning (Central China Normal University), Wuhan 430079,
China)
3
(School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China)
Abstract: Existing adversarial example detection methods based on image transformation employ the characteristic that the image
transformation can significantly change the feature distribution of adversarial examples but slightly change the feature distribution of
benign examples. Adversarial examples can be detected by calculating the feature distance before and after image transformation. However,
with the deepening research on adversarial attacks, researchers pay more attention to enhancing the robustness of adversarial examples, so
that some attacks can be “immune” to the effect exerted by image transformation. Existing methods are difficult to detect robust
adversarial examples effectively. This paper observes that the existing adversarial examples are too robust, and the feature distribution
distance of robust adversarial examples under image transformation is much smaller than that of benign examples, which is not consistent

* 基金项目: 国家资助博士后研究人员计划 (GZC20230922); 中国博士后科学基金 (2024M751050); 华中师范大学中央高校基本科研业务
费 (CCNU24XJ001, CCNU24ai010)
收稿时间: 2024-05-06; 修改时间: 2024-07-14, 2024-08-29, 2024-09-19; 采用时间: 2024-10-02; jos 在线出版时间: 2025-01-24
CNKI 网络首发时间: 2025-01-26

462 463 464 465 466 467 468 469 470 471 472