Page 188 - 《软件学报》2021年第12期
P. 188
软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software,2021,32(12):3852−3868 [doi: 10.13328/j.cnki.jos.006128] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563
∗
一种基于注意力联邦蒸馏的推荐方法
谌 明, 张 蕾, 马天翼
(浙江省同花顺人工智能研究院,浙江 杭州 310012)
通讯作者: 谌明, E-mail: chm@zju.edu.cn
摘 要: 数据隐私保护问题已成为推荐系统面临的主要挑战之一.随着《中华人民共和国网络安全法》的颁布和
欧盟《通用数据保护条例》的实施,数据隐私和安全成为了世界性的趋势.联邦学习可通过不交换数据训练全局模
型,不会泄露用户隐私.但是联邦学习存在每台设备数据量少、模型容易过拟合、数据稀疏导致训练好的模型很难
达到较高的预测精度等问题.同时,随着 5G(the 5th generation mobile communication technology)时代的到来,个人设
备数据量和传输速率预计比当前提高 10~100 倍,因此要求模型执行效率更高.针对此问题,知识蒸馏可以将教师模
型中的知识迁移到更为紧凑的学生模型中去,让学生模型能尽可能逼近或是超过教师网络,从而有效解决模型参数
多和通信开销大的问题.但往往蒸馏后的学生模型在精度上会低于教师模型.提出一种面向推荐系统的联邦蒸馏方
法,该方法首先在联邦蒸馏的目标函数中加入 Kullback-Leibler 散度和正则项,减少教师网络和学生网络间的差异性
影响;引入多头注意力机制丰富编码信息,提升模型精度;并提出一个改进的自适应学习率训练策略来自动切换优化
算法,选择合适的学习率,提升模型的收敛速度.实验验证了该方法的有效性:相比基准算法,模型的训练时间缩短
52%,模型的准确率提升了 13%,平均误差减少 17%,NDCG 值提升了 10%.
关键词: 联邦学习;分布式学习;联邦蒸馏;推荐系统;注意力机制
中图法分类号: TP18
中文引用格式: 谌明,张蕾,马天翼.一种基于注意力联邦蒸馏的推荐方法.软件学报,2021,32(12):3852−3868. http://www.jos.
org.cn/1000-9825/6128.htm
英文引用格式: Chen M, Zhang L, Ma TY. Recommendation approach based on attentive federated distillation. Ruan Jian Xue
Bao/Journal of Software, 2021,32(12):3852−3868 (in Chinese). http://www.jos.org.cn/1000-9825/6128.htm
Recommendation Approach Based on Attentive Federated Distillation
CHEN Ming, ZHANG Lei, MA Tian-Yi
(Zhejiang HiThink RoyalFlush AI Research Institute, Hangzhou 310012, China)
Abstract: Data privacy protection has become one of the major challenges of recommendation systems. With the release of the
Cybersecurity Law of the People's Republic of China and the general data protection regulation in the European Union, data privacy and
security have become a worldwide concern. Federated learning can train the global model without exchanging user data, thus protecting
users' privacy. Nevertheless, federated learning is still facing many issues, such as the small size of local data in each device, over-fitting
of local model, and the data sparsity, which makes it difficult to reach higher accuracy. Meanwhile, with the advent of 5G (the 5th
generation mobile communication technology) era, the data volume and transmission rate of personal devices are expected to be 10 to 100
times higher than the current ones, which requires higher model efficiency. Knowledge distillation can transfer the knowledge from the
teacher model to a more compact student model so that the student model can approach or surpass the performance of teacher model, thus
effectively solve the problems of large model parameter and high communication cost. However, the accuracy of student model is lower
than teacher model after knowledge distillation. Therefore, a federated distillation approach is proposed with attentional mechanisms for
recommendation systems. First, the method introduces Kullback-Leibler divergence and regularization term to the objective function of
federated distillation to reduce the impact of heterogeneity between teacher network and student network; then it introduces multi-head
∗ 收稿时间: 2020-01-18; 修改时间: 2020-04-18; 采用时间: 2020-08-07