Page 416 - 《软件学报》2025年第10期
P. 416

张锦弘 等: 基于视觉特征解耦的无数据依赖模型窃取攻击方法                                                   4813


                 3
                 (School of Software, Yunnan University, Kunming 650500, China)
                 4
                 (Engineering Research Center of Cyberspace, Yunnan University, Kunming 650500, China)
                 Abstract:  With  the  continuous  deepening  of  research  on  the  security  and  privacy  of  deep  learning  models,  researchers  find  that  model
                 stealing  attacks  pose  a  tremendous  threat  to  neural  networks.  A  typical  data-dependent  model  stealing  attack  can  use  a  certain  percentage
                 of  real  data  to  query  the  target  model  and  train  an  alternative  model  locally  to  steal  the  target  model.  Since  2020,  a  novel  data-free  model
                 stealing attack method has been proposed, which can steal and attack deep neural networks simply by using fake query examples generated
                 by generative models. Since it does not rely on real data, the data-free model stealing attack can cause more serious damage. However, the
                 diversity  and  effectiveness  of  the  query  examples  constructed  by  the  current  data-free  model  stealing  attack  methods  are  insufficient,  and
                 there  are  problems  of  a  large  number  of  queries  and  a  relatively  low  success  rate  of  the  attack  during  the  model  stealing  process.
                 Therefore,  this  study  proposes  a  vision  feature  decoupling-based  model  stealing  attack  (VFDA),  which  decouples  and  generates  the  visual
                 features  of  the  query  examples  generated  during  the  data-free  model  stealing  process  by  using  a  multi-decoder  structure,  thus  improving
                 the  diversity  of  query  examples  and  the  effectiveness  of  model  stealing.  Specifically,  VFDA  uses  three  decoders  to  respectively  generate
                 the  texture  information,  region  encoding,  and  smoothing  information  of  query  examples  to  complete  the  decoupling  of  visual  features  of
                 query examples. Secondly, to make the generated query examples more consistent with the visual features of real examples, the sparsity of
                 the  texture  information  is  limited  and  the  generated  smoothing  information  is  filtered.  VFDA  exploits  the  property  that  the  representational
                 tendency  of  neural  networks  depends  on  the  image  texture  features,  and  can  generate  query  examples  with  inter-class  diversity,  thus
                 effectively  improving  the  similarity  of  model  stealing  and  the  success  rate  of  the  attack.  In  addition,  VFDA  adds  intra-class  diversity  loss
                 to  the  smoothed  information  of  query  samples  generated  through  decoupling  to  make  the  query  samples  more  consistent  with  real  sample
                 distribution.  By  comparing  with  multiple  model  stealing  attack  methods,  the  VFDA  method  proposed  in  this  study  has  better  performance
                 in  the  similarity  of  model  stealing  and  the  success  rate  of  the  attack.  In  particular,  on  the  GTSRB  and  Tiny-ImageNet  datasets  with  high
                 resolution,  the  attack  success  rate  is  respectively  improved  by  3.86%  and  4.15%  on  average  compared  with  the  currently  better  EBFA
                 method.
                 Key words:  model stealing; adversarial example; transfer attack; generative model; model privacy
                    随着深度学习在计算机视觉和自然语言处理等领域取得的跨越式发展, 人工智能的安全性、隐私性受到了越
                 来越多的关注. 现有研究表明, 人工智能模型             (机器学习模型或深度学习模型) 很容易受到对抗攻击的影响                   [1–3] , 即
                 通过在干净样本中加入精心制作的微小扰动, 这些扰动能够使得人工智能模型产生错误的预测结果. 此外, 大模型
                 的训练需要极其高昂的成本, 如训练数据的收集、标注人工成本、模型训练的硬件成本等. 模型研发和使用方迫
                 切需要提升对这类高成本模型的防护能力, 以免模型泄露, 造成不必要的损失. 然而, 目前的研究表明, 部署在云端
                 的深度学习模型存在被窃取和攻击的风险, 这类安全和隐私问题严重影响了深度学习的发展. 对深度学习模型进
                 行窃取和攻击被称为模型窃取攻击, 其过程如图               1  所示.

                                             ① 模型窃取阶段                     ② 攻击阶段
                                                     训练模型
                                                                  代替模型
                                                                                攻击样本

                                                                          生成
                                                               获取
                                                               查询         对抗
                                               查询样本                       样本
                                                               结果
                                               无数据依赖                     并攻击
                                                           查询
                                               生成器
                                                                 目标黑盒模型
                                   攻击者                                           错误输出

                                               数据依赖

                                       图 1 模型窃取攻击中模型窃取阶段及攻击阶段示意图
   411   412   413   414   415   416   417   418   419   420   421