Page 220 - 《软件学报》2025年第9期
P. 220

王鑫澳 等: 基于联邦学习的       BERT  模型高效训练框架                                             4131


                     2127–2156 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6052.htm [doi: 10.13328/j.cnki.jos.006052]
                  [5]   McMahan B, Moore E, Ramage D, Hampson S, Arcas BA. Communication-efficient learning of deep networks from decentralized data.
                     In: Proc. of the 20th Int’l Conf. on Artificial Intelligence and Statistics. Fort Lauderdale: PMLR, 2017. 1273–1282.
                  [6]   Peters M, Neumann M, Iyyer M, Gardner M, Zettlemoyer L. Deep contextualized word representations. In: Proc. of the 2018 Conf. of the
                     North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long Papers). New
                     Orleans: Association for Computational Linguistics, 2018. 2227–2237. [doi: 10.18653/v1/N18-1202]
                  [7]   Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative pre-training. 2018. https://cdn.
                     openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
                  [8]   Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional Transformers for language understanding. In: Proc.
                     of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1
                     (Long and Short Papers). Minneapolis: Association for Computational Linguistics, 2019. 4171–4186. [doi: 10.18653/v1/N19-1423]
                  [9]   Liu YH, Ott M, Goyal N, Du JF, Joshi M, Chen DQ, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. RoBERTa: A robustly optimized
                     BERT pretraining approach. arXiv:1907.11692, 2019.
                 [10]   Sun C, Qiu XP, Xu YG, Huang XJ. How to fine-tune BERT for text classification? In: Proc. of the 18th China National Conf. on Chinese
                     Computational Linguistics. Kunming: Springer, 2019. 194–206. [doi: 10.1007/978-3-030-32381-3_16]
                 [11]   Krizhevsky A. Learning multiple layers of features from tiny images. Technical Report, Toronto: University of Toronto, 2009.
                 [12]   He KM, Zhang XY, Ren SQ, Sun J. Deep residual learning for image recognition. In: Proc. of the 2016 IEEE Conf. on Computer Vision
                     and Pattern Recognition. Las Vegas: IEEE, 2016. 770–778. [doi: 10.1109/CVPR.2016.90]
                 [13]   Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proc. of the
                     31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.
                 [14]   Wang HP, Stich SU, He Y, Fritz M. Progfed: Effective, communication, and computation efficient federated learning by progressive
                     training. In: Proc. of the 39th Int’l Conf. on Machine Learning. Baltimore: PMLR, 2022. 23034–23054.
                 [15]   Alistarh D, Grubic D, Li JZ, Tomioka R, Vojnovic M. QSGD: Communication-efficient SGD via gradient quantization and encoding. In:
                     Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 1707–1718.
                 [16]   Lin YJ, Han S, Mao HZ, Wang Y, Dally B. Deep gradient compression: Reducing the communication bandwidth for distributed training.
                     In: Proc. of the 6th Int’l Conf. on Learning Representations. Vancouver: OpenReview.net, 2018.
                 [17]   Fu FC, Hu YZ, He YH, Jiang JW, Shao YX, Zhang C, Cui B. Don’t waste your bits! Squeeze activations and gradients for deep neural
                     networks via TinyScript. In: Proc. of the 37th Int’l Conf. on Machine Learning. PMLR, 2020. 3304–3314.
                 [18]   Stich SU, Cordonnier JB, Jaggi M. Sparsified SGD with memory. In: Proc. of the 32nd Int’l Conf. on Neural Information Processing
                     Systems. Montreal: Curran Associates Inc., 2018. 4452–4463.
                 [19]   Konečný  J,  McMahan  HB,  Yu  FX,  Richtárik  P,  Suresh  AT,  Bacon  D.  Federated  learning:  Strategies  for  improving  communication
                     efficiency. arXiv:1610.05492, 2016.
                 [20]   Li DL, Wang JP. FedMD: Heterogenous federated learning via model distillation. arXiv:1910.03581, 2019.
                 [21]   Lin T, Kong LJ, Stich SU, Jaggi M. Ensemble distillation for robust model fusion in federated learning. In: Proc. of the 34th Int’l Conf.
                     on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 198.
                 [22]   Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv:1503.02531, 2015.
                 [23]   Wang XA, Li H, Chen K, Shou LD. FEDBFPT: An efficient federated learning framework for BERT further pre-training. In: Proc. of the
                     32nd Int’l Joint Conf. on Artificial Intelligence. Macao: ijcai.org, 2023. 4344–4352. [doi: 10.24963/IJCAI.2023/483]
                 [24]   Wang Y, Li GL, Li KY. Survey on contribution evaluation for federated learning. Ruan Jian Xue Bao/Journal of Software, 2023, 34(3):
                     1168–1192 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6786.htm [doi: 10.13328/j.cnki.jos.006786]
                 [25]   Rong X. Word2Vec parameter learning explained. arXiv:1411.2738, 2014.
                 [26]   Pennington J, Socher R, Manning CD. GloVe: Global vectors for word representation. In: Proc. of the 2014 Conf. on Empirical Methods
                     in Natural Language Processing (EMNLP). Doha: Association for Computational Linguistics, 2014. 1532–1543. [doi: 10.3115/v1/D14-
                     1162]
                 [27]   Beltagy I, Lo K, Cohan A. SciBERT: A pretrained language model for scientific text. In: Proc. of the 2019 Conf. on Empirical Methods in
                     Natural Language Processing and the 9th Int’l Joint Conf. on Natural Language Processing (EMNLP-IJCNLP). Hong Kong: Association
                     for Computational Linguistics, 2019. 3615–3620. [doi: 10.18653/v1/D19-1371]
                 [28]   Lee  J,  Yoon  W,  Kim  S,  Kim  D,  Kim  S,  So  CH,  Kang  J.  BioBERT:  A  pre-trained  biomedical  language  representation  model  for
                     biomedical text mining. Bioinformatics, 2020, 36(4): 1234–1240. [doi: 10.1093/bioinformatics/btz682]
                 [29]   Yang Y, Uy MCS, Huang A. FinBERT: A pretrained language model for financial communications. arXiv:2006.08097, 2020.
   215   216   217   218   219   220   221   222   223   224   225