Page 8 - 《软件学报》2025年第7期
P. 8
软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
2025,36(7):2929−2946 [doi: 10.13328/j.cnki.jos.007332] [CSTR: 32375.14.jos.007332] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563
*
eDPRF: 高效的差分隐私随机森林训练算法
王树兰 1 , 邱 瑶 1 , 赵陈斌 2 , 邹家须 1 , 王彩芬 1
1
(深圳技术大学 大数据与互联网学院, 广东 深圳 518118)
2
(空天信息安全与可信计算教育部重点实验室 (武汉大学 国家网络安全学院), 湖北 武汉 430072)
通信作者: 赵陈斌, E-mail: chenbinzhao96@whu.edu.cn
摘 要: 差分隐私凭借其强大的隐私保护能力被应用在随机森林算法解决其中的隐私泄露问题, 然而, 直接将差分
隐私应用在随机森林算法会使模型的分类准确率严重下降. 为了平衡隐私保护和模型准确性之间的矛盾, 提出了
一种高效的差分隐私随机森林训练算法 eDPRF (efficient differential privacy random forest). 具体而言, 该算法设计
了决策树构建方法, 通过引入重排翻转机制高效地查询输出优势, 进一步设计相应的效用函数实现分裂特征以及
标签的精准输出, 有效改善树模型在扰动情况下对于数据信息的学习能力. 同时基于组合定理设计了隐私预算分
配的策略, 通过不放回抽样获得训练子集以及差异化调整内部预算的方式提高树节点的查询预算. 最后, 通过理论
分析以及实验评估, 表明算法在给定相同隐私预算的情况下, 模型的分类准确度优于同类算法.
关键词: 随机森林; 差分隐私; 隐私预算; 重排翻转; 扰动方式
中图法分类号: TP18
中文引用格式: 王树兰, 邱瑶, 赵陈斌, 邹家须, 王彩芬. eDPRF: 高效的差分隐私随机森林训练算法. 软件学报, 2025, 36(7): 2929–
2946. http://www.jos.org.cn/1000-9825/7332.htm
英文引用格式: Wang SL, Qiu Y, Zhao CB, Zou JX, Wang CF. eDPRF: Efficient Differential Privacy Random Forest Training
Algorithm. Ruan Jian Xue Bao/Journal of Software, 2025, 36(7): 2929–2946 (in Chinese). http://www.jos.org.cn/1000-9825/7332.htm
eDPRF: Efficient Differential Privacy Random Forest Training Algorithm
2
1
1
1
WANG Shu-Lan , QIU Yao , ZHAO Chen-Bin , ZOU Jia-Xu , WANG Cai-Fen 1
1
(College of Big Data and Internet, Shenzhen Technology University, Shenzhen 518118, China)
2
(Key Laboratory of Aerospace Information Security and Trusted Computing of Ministry of Education (School of Cyber Science and
Engineering, Wuhan University), Wuhan 430072, China)
Abstract: Differential privacy, owing to its strong privacy protection capacity, is applied to the random forest algorithm to address the
privacy leakage problem. However, the direct application of differential privacy to the random forest algorithm leads to a significant
decline in the model’s classification accuracy. To balance the contradiction between privacy protection and model accuracy, this study
proposes an efficient differential privacy random forest training algorithm, efficient differential privacy random forest (eDPRF).
Specifically, the study designs a decision tree construction method based on the permute-and-flip mechanism. By introducing the efficient
query output advantage of the permute and flip mechanism, the corresponding utility functions are further designed to achieve the precise
output of split features and labels, effectively enhancing the learning ability of the tree model for data information under perturbation
circumstances. At the same time, the study designs a privacy budget allocation strategy based on the composition theorem, which improves
the privacy budget utilization rate of nodes by obtaining training subsets without replacement sampling and adjusting internal budgets
through differentiation. Finally, through theoretical analysis and experimental evaluation, it is demonstrated that the proposed algorithm
* 基金项目: 国家自然科学基金 (61702341); 深圳技术大学深圳市高等院校稳定支持项目 (SZWD2021012); 深圳技术大学研究生校企合
作研究基金 (20223108010009)
本文由“新兴软件与系统的可信赖性与安全”专题特约编辑向剑文教授、陈厅教授、杨珉教授、周俊伟教授推荐.
收稿时间: 2024-07-10; 修改时间: 2024-10-15; 采用时间: 2024-11-25; jos 在线出版时间: 2024-12-10
CNKI 网络首发时间: 2025-04-21

