Page 8 - 《软件学报》2025年第7期
P. 8

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
                 2025,36(7):2929−2946 [doi: 10.13328/j.cnki.jos.007332] [CSTR: 32375.14.jos.007332]  http://www.jos.org.cn
                 ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563



                                                                     *
                 eDPRF: 高效的差分隐私随机森林训练算法

                 王树兰  1 ,    邱    瑶  1 ,    赵陈斌  2 ,    邹家须  1 ,    王彩芬  1


                 1
                  (深圳技术大学 大数据与互联网学院, 广东 深圳 518118)
                 2
                  (空天信息安全与可信计算教育部重点实验室 (武汉大学 国家网络安全学院), 湖北 武汉 430072)
                 通信作者: 赵陈斌, E-mail: chenbinzhao96@whu.edu.cn

                 摘 要: 差分隐私凭借其强大的隐私保护能力被应用在随机森林算法解决其中的隐私泄露问题, 然而, 直接将差分
                 隐私应用在随机森林算法会使模型的分类准确率严重下降. 为了平衡隐私保护和模型准确性之间的矛盾, 提出了
                 一种高效的差分隐私随机森林训练算法              eDPRF (efficient differential privacy random forest). 具体而言, 该算法设计
                 了决策树构建方法, 通过引入重排翻转机制高效地查询输出优势, 进一步设计相应的效用函数实现分裂特征以及
                 标签的精准输出, 有效改善树模型在扰动情况下对于数据信息的学习能力. 同时基于组合定理设计了隐私预算分
                 配的策略, 通过不放回抽样获得训练子集以及差异化调整内部预算的方式提高树节点的查询预算. 最后, 通过理论
                 分析以及实验评估, 表明算法在给定相同隐私预算的情况下, 模型的分类准确度优于同类算法.
                 关键词: 随机森林; 差分隐私; 隐私预算; 重排翻转; 扰动方式
                 中图法分类号: TP18

                 中文引用格式: 王树兰, 邱瑶, 赵陈斌, 邹家须, 王彩芬. eDPRF: 高效的差分隐私随机森林训练算法. 软件学报, 2025, 36(7): 2929–
                 2946. http://www.jos.org.cn/1000-9825/7332.htm
                 英文引用格式: Wang  SL,  Qiu  Y,  Zhao  CB,  Zou  JX,  Wang  CF.  eDPRF:  Efficient  Differential  Privacy  Random  Forest  Training
                 Algorithm. Ruan Jian Xue Bao/Journal of Software, 2025, 36(7): 2929–2946 (in Chinese). http://www.jos.org.cn/1000-9825/7332.htm

                 eDPRF: Efficient Differential Privacy Random Forest Training Algorithm
                                                   2
                                     1
                             1
                                                             1
                 WANG Shu-Lan , QIU Yao , ZHAO Chen-Bin , ZOU Jia-Xu , WANG Cai-Fen 1
                 1
                 (College of Big Data and Internet, Shenzhen Technology University, Shenzhen 518118, China)
                 2
                 (Key  Laboratory  of  Aerospace  Information  Security  and  Trusted  Computing  of  Ministry  of  Education  (School  of  Cyber  Science  and
                  Engineering, Wuhan University), Wuhan 430072, China)
                 Abstract:  Differential  privacy,  owing  to  its  strong  privacy  protection  capacity,  is  applied  to  the  random  forest  algorithm  to  address  the
                 privacy  leakage  problem.  However,  the  direct  application  of  differential  privacy  to  the  random  forest  algorithm  leads  to  a  significant
                 decline  in  the  model’s  classification  accuracy.  To  balance  the  contradiction  between  privacy  protection  and  model  accuracy,  this  study
                 proposes  an  efficient  differential  privacy  random  forest  training  algorithm,  efficient  differential  privacy  random  forest  (eDPRF).
                 Specifically,  the  study  designs  a  decision  tree  construction  method  based  on  the  permute-and-flip  mechanism.  By  introducing  the  efficient
                 query  output  advantage  of  the  permute  and  flip  mechanism,  the  corresponding  utility  functions  are  further  designed  to  achieve  the  precise
                 output  of  split  features  and  labels,  effectively  enhancing  the  learning  ability  of  the  tree  model  for  data  information  under  perturbation
                 circumstances. At the same time, the study designs a privacy budget allocation strategy based on the composition theorem, which improves
                 the  privacy  budget  utilization  rate  of  nodes  by  obtaining  training  subsets  without  replacement  sampling  and  adjusting  internal  budgets
                 through  differentiation.  Finally,  through  theoretical  analysis  and  experimental  evaluation,  it  is  demonstrated  that  the  proposed  algorithm


                 *    基金项目: 国家自然科学基金  (61702341); 深圳技术大学深圳市高等院校稳定支持项目      (SZWD2021012); 深圳技术大学研究生校企合
                  作研究基金   (20223108010009)
                  本文由“新兴软件与系统的可信赖性与安全”专题特约编辑向剑文教授、陈厅教授、杨珉教授、周俊伟教授推荐.
                  收稿时间: 2024-07-10; 修改时间: 2024-10-15; 采用时间: 2024-11-25; jos 在线出版时间: 2024-12-10
                  CNKI 网络首发时间: 2025-04-21
   3   4   5   6   7   8   9   10   11   12   13