Page 199 - 《软件学报》2025年第9期
P. 199

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
                 2025,36(9):4110−4133 [doi: 10.13328/j.cnki.jos.007259] [CSTR: 32375.14.jos.007259]  http://www.jos.org.cn
                 ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563



                                                                    *
                 基于联邦学习的             BERT     模型高效训练框架

                 王鑫澳,    陈    珂,    寿黎但,    骆歆远,    陈    刚


                 (区块链与数据安全全国重点实验室         (浙江大学), 浙江 杭州 310027)
                 通信作者: 陈珂, E-mail: chenk@zju.edu.cn

                 摘 要: 高质量的训练数据对于预训练语言模型                (PLM) 至关重要, 但许多专业领域的数据因隐私问题而无法集中
                 收集用于模型训练. 借助联邦学习, 可以在保护数据隐私的前提下完成模型训练. 然而, 联邦学习的客户端通常资
                 源有限, 无法完成预训练语言模型的训练. 针对这一问题进行深入研究. 首先, 明确定义在资源有限前提下完成模
                 型训练的问题, 通过调整计算开销与通信开销来优化模型的训练效果. 其次, 介绍一种适用于联邦学习环境下的
                 BERT  模型高效训练框架——FedBT. 该框架旨在实现            BERT  模型在联邦学习客户端上的训练, 涵盖进一步预训练
                 和下游任务微调两种场景. FedBT        适应不同的应用场景, 在客户端针对            BERT  模型的关键参数进行训练, 并仅将更
                 新的参数上传至服务器进行聚合. 这种方法显著减少模型训练过程中的计算和通信成本. 最后, 在多个专业领域的
                 数据集上进行充分的实验对比, 进一步预训练场景下, FedBT                框架可以降低客户端的训练开销与通信开销至原来
                 的  34.31%  和  7.04%, 下游任务微调场景下, FedBT  框架可以降低客户端的训练开销与通信开销至原来的                    48.26%
                 和  20.19%, 并且均实现同传统联邦学习训练完整模型接近的精确度.
                 关键词: 联邦学习; 预训练语言模型; 进一步预训练; 下游任务微调
                 中图法分类号: TP18

                 中文引用格式: 王鑫澳, 陈珂, 寿黎但, 骆歆远, 陈刚. 基于联邦学习的BERT模型高效训练框架. 软件学报, 2025, 36(9): 4110–4133.
                 http://www.jos.org.cn/1000-9825/7259.htm
                 英文引用格式: Wang XA, Chen K, Shou LD, Luo XY, Chen G. Efficient Framework for BERT Model Training Based on Federated
                 Learning. Ruan Jian Xue Bao/Journal of Software, 2025, 36(9): 4110–4133 (in Chinese). http://www.jos.org.cn/1000-9825/7259.htm

                 Efficient Framework for BERT Model Training Based on Federated Learning

                 WANG Xin-Ao, CHEN Ke, SHOU Li-Dan, LUO Xin-Yuan, CHEN Gang
                 (State Key Laboratory of Blockchain and Data Security (Zhejiang University), Hangzhou 310027, China)

                 Abstract:  High-quality  training  data  is  instrumental  in  pre-trained  language  models  (PLMs),  yet  privacy  concerns  often  preclude  the
                 centralized  collection  of  data  from  many  professional  domains.  Federated  learning  offers  a  solution  by  enabling  model  training  while
                 safeguarding  data  privacy.  However,  the  limited  resources  of  federated  learning  clients  pose  a  challenge  to  the  training  of  pre-trained
                 language  models.  This  study  addresses  this  issue  through  several  steps.  Firstly,  it  defines  the  problem  of  completing  model  training  with
                 limited  resources  and  explores  strategies  to  balance  computational  and  communication  costs  for  optimizing  training  efficiency.  Secondly,  it
                 introduces  an  efficient  federated  learning  framework  for  BERT  further  pre-training  and  fine-tuning  (FedBT).  FedBT  facilitates  the  training
                 of  the  BERT  model  on  federated  learning  clients,  encompassing  both  further  pre-training  and  downstream  task  fine-tuning.  Depending  on
                 the  application  context,  FedBT  selectively  trains  key  parameters  of  the  BERT  model  at  the  clients,  uploading  only  the  updated  parameters
                 to  the  server  for  aggregation.  This  approach  significantly  reduces  both  computational  and  communication  overhead  during  training.  Finally,
                 extensive  experiments  are  conducted  on  datasets  from  multiple  professional  domains.  Results  demonstrate  that  FedBT  reduces  client-side
                 computational  costs  to  34.31%  and  communication  costs  to  7.04%  during  further  pre-training.  In  downstream  task  fine-tuning,  it  reduces
                 client-side computational costs to 48.26% and communication costs to 20.19%. The accuracy achieved in both pre-training and downstream


                 *    基金项目: 浙江省“尖兵”计划  (2024C01021)
                  收稿时间: 2024-03-20; 修改时间: 2024-05-05; 采用时间: 2024-07-25; jos 在线出版时间: 2025-01-24
                  CNKI 网络首发时间: 2025-01-26
   194   195   196   197   198   199   200   201   202   203   204