Page 99 - 《软件学报》2025年第12期
P. 99

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
                 2025,36(12):5480−5494 [doi: 10.13328/j.cnki.jos.007432] [CSTR: 32375.14.jos.007432]  http://www.jos.org.cn
                 ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563



                                                           *
                 云边协同的深度学习作业调度方法

                 谷典典,    金    鑫,    刘譞哲


                 (北京大学 计算机学院, 北京 100871)
                 通信作者: 金鑫, E-mail: xinjinpku@pku.edu.cn

                 摘 要: 边缘服务器      (edge server) 为移动智能应用提供了低延时、高性能的服务. 然而, 由于边缘服务器上的负载
                 量随时间波动较大, 在负载较低的时刻, 许多边缘服务器处于闲置状态, 其计算资源并没有得到充分利用. 与边缘
                 服务器的利用率不同, 随着人工智能技术在人们生活中的应用越来越广泛, 云计算集群中的计算资源对于深度学
                 习训练作业来说仍较为紧张. 现有的集群调度策略不能有效利用云计算集群外的空闲计算资源, 而有效利用云计
                 算集群外的空闲计算资源可以缓解云计算集群的资源紧张问题, 从而使得更多截止期敏感的深度学习训练作业在
                 截止期之前完成. 针对这一问题, 设计一种面向截止期敏感的深度学习训练作业的集群调度策略, 协同调度云计算
                 资源和空闲的边缘计算资源, 充分利用不同深度学习训练作业的性能特征和空闲的边缘服务器设备, 使得更多的
                 截止期敏感的深度学习训练作业在其截止期之前完成. 最后, 实验结果表明, 云边协同的调度方法在提升作业的截
                 止期满足率方面优于其他基线方法, 并有效地利用空闲的边缘服务器设备, 提高计算资源的利用率.
                 关键词: 云边协同; 深度学习训练; 集群调度; 集群管理; 截止期
                 中图法分类号: TP311

                 中文引用格式: 谷典典, 金鑫, 刘譞哲. 云边协同的深度学习作业调度方法. 软件学报, 2025, 36(12): 5480–5494. http://www.jos.org.
                 cn/1000-9825/7432.htm
                 英文引用格式: Gu  DD,  Jin  X,  Liu  XZ.  Cloud-edge  Coordinated  Scheduling  Method  for  Deep  Learning  Jobs.  Ruan  Jian  Xue
                 Bao/Journal of Software, 2025, 36(12): 5480–5494 (in Chinese). http://www.jos.org.cn/1000-9825/7432.htm

                 Cloud-edge Coordinated Scheduling Method for Deep Learning Jobs
                 GU Dian-Dian, JIN Xin, LIU Xuan-Zhe
                 (School of Computer Science, Peking University, Beijing 100871, China)
                 Abstract:  Edge  servers  provide  low-latency,  high-performance  services  for  mobile  intelligent  applications.  However,  due  to  significant
                 fluctuations  in  the  load  on  edge  servers  over  time,  many  edge  servers  remain  idle  during  periods  of  low  load,  and  their  computational
                 resources  are  not  fully  utilized.  In  contrast  to  the  underutilization  of  edge  servers,  computing  resources  in  cloud  computing  clusters  remain
                 relatively  scarce  for  deep  learning  training  tasks  as  artificial  intelligence  becomes  more  widely  applied  in  daily  life.  Existing  cluster
                 scheduling  strategies  fail  to  efficiently  utilize  idle  computing  resources  outside  of  cloud  computing  clusters.  Effectively  utilizing  these  idle
                 resources  can  alleviate  the  resource  constraints  in  cloud  computing  clusters,  thus  enabling  more  deadline-sensitive  deep  learning  training
                 tasks  to  be  completed  before  their  deadlines.  To  address  this  issue,  this  study  proposes  a  cluster  scheduling  strategy  for  deadline-sensitive
                 deep  learning  training  tasks,  which  coordinates  the  scheduling  of  cloud  computing  resources  and  idle  edge  computing  resources.  This
                 strategy  fully  leverages  the  performance  characteristics  of  different  deep  learning  tasks  and  the  availability  of  idle  edge  server  devices,
                 allowing  more  deadline-sensitive  tasks  to  be  completed  on  time.  Simulation  results  demonstrate  that  the  cloud-edge  collaborative
                 scheduling  method  outperforms  other  benchmark  methods  in  improving  the  deadline  satisfaction  ratio  and  effectively  utilizes  idle  edge
                 server devices.
                 Key words:  cloud-edge coordination; deep learning training; cluster scheduling; cluster management; deadline


                 *    基金项目: 国家重点研发计划  (2022YFB4500700); 国家杰出青年科学基金  (62325201); 国家自然科学基金 (62172008)
                  收稿时间: 2024-01-08; 修改时间: 2024-12-22, 2025-02-19; 采用时间: 2025-03-23; jos 在线出版时间: 2025-07-23
                  CNKI 网络首发时间: 2025-07-23
   94   95   96   97   98   99   100   101   102   103   104