Page 416 - 《软件学报》2025年第10期
P. 416
张锦弘 等: 基于视觉特征解耦的无数据依赖模型窃取攻击方法 4813
3
(School of Software, Yunnan University, Kunming 650500, China)
4
(Engineering Research Center of Cyberspace, Yunnan University, Kunming 650500, China)
Abstract: With the continuous deepening of research on the security and privacy of deep learning models, researchers find that model
stealing attacks pose a tremendous threat to neural networks. A typical data-dependent model stealing attack can use a certain percentage
of real data to query the target model and train an alternative model locally to steal the target model. Since 2020, a novel data-free model
stealing attack method has been proposed, which can steal and attack deep neural networks simply by using fake query examples generated
by generative models. Since it does not rely on real data, the data-free model stealing attack can cause more serious damage. However, the
diversity and effectiveness of the query examples constructed by the current data-free model stealing attack methods are insufficient, and
there are problems of a large number of queries and a relatively low success rate of the attack during the model stealing process.
Therefore, this study proposes a vision feature decoupling-based model stealing attack (VFDA), which decouples and generates the visual
features of the query examples generated during the data-free model stealing process by using a multi-decoder structure, thus improving
the diversity of query examples and the effectiveness of model stealing. Specifically, VFDA uses three decoders to respectively generate
the texture information, region encoding, and smoothing information of query examples to complete the decoupling of visual features of
query examples. Secondly, to make the generated query examples more consistent with the visual features of real examples, the sparsity of
the texture information is limited and the generated smoothing information is filtered. VFDA exploits the property that the representational
tendency of neural networks depends on the image texture features, and can generate query examples with inter-class diversity, thus
effectively improving the similarity of model stealing and the success rate of the attack. In addition, VFDA adds intra-class diversity loss
to the smoothed information of query samples generated through decoupling to make the query samples more consistent with real sample
distribution. By comparing with multiple model stealing attack methods, the VFDA method proposed in this study has better performance
in the similarity of model stealing and the success rate of the attack. In particular, on the GTSRB and Tiny-ImageNet datasets with high
resolution, the attack success rate is respectively improved by 3.86% and 4.15% on average compared with the currently better EBFA
method.
Key words: model stealing; adversarial example; transfer attack; generative model; model privacy
随着深度学习在计算机视觉和自然语言处理等领域取得的跨越式发展, 人工智能的安全性、隐私性受到了越
来越多的关注. 现有研究表明, 人工智能模型 (机器学习模型或深度学习模型) 很容易受到对抗攻击的影响 [1–3] , 即
通过在干净样本中加入精心制作的微小扰动, 这些扰动能够使得人工智能模型产生错误的预测结果. 此外, 大模型
的训练需要极其高昂的成本, 如训练数据的收集、标注人工成本、模型训练的硬件成本等. 模型研发和使用方迫
切需要提升对这类高成本模型的防护能力, 以免模型泄露, 造成不必要的损失. 然而, 目前的研究表明, 部署在云端
的深度学习模型存在被窃取和攻击的风险, 这类安全和隐私问题严重影响了深度学习的发展. 对深度学习模型进
行窃取和攻击被称为模型窃取攻击, 其过程如图 1 所示.
① 模型窃取阶段 ② 攻击阶段
训练模型
代替模型
攻击样本
生成
获取
查询 对抗
查询样本 样本
结果
无数据依赖 并攻击
查询
生成器
目标黑盒模型
攻击者 错误输出
数据依赖
图 1 模型窃取攻击中模型窃取阶段及攻击阶段示意图

