Page 99 - 《软件学报》2025年第12期
P. 99
软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
2025,36(12):5480−5494 [doi: 10.13328/j.cnki.jos.007432] [CSTR: 32375.14.jos.007432] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563
*
云边协同的深度学习作业调度方法
谷典典, 金 鑫, 刘譞哲
(北京大学 计算机学院, 北京 100871)
通信作者: 金鑫, E-mail: xinjinpku@pku.edu.cn
摘 要: 边缘服务器 (edge server) 为移动智能应用提供了低延时、高性能的服务. 然而, 由于边缘服务器上的负载
量随时间波动较大, 在负载较低的时刻, 许多边缘服务器处于闲置状态, 其计算资源并没有得到充分利用. 与边缘
服务器的利用率不同, 随着人工智能技术在人们生活中的应用越来越广泛, 云计算集群中的计算资源对于深度学
习训练作业来说仍较为紧张. 现有的集群调度策略不能有效利用云计算集群外的空闲计算资源, 而有效利用云计
算集群外的空闲计算资源可以缓解云计算集群的资源紧张问题, 从而使得更多截止期敏感的深度学习训练作业在
截止期之前完成. 针对这一问题, 设计一种面向截止期敏感的深度学习训练作业的集群调度策略, 协同调度云计算
资源和空闲的边缘计算资源, 充分利用不同深度学习训练作业的性能特征和空闲的边缘服务器设备, 使得更多的
截止期敏感的深度学习训练作业在其截止期之前完成. 最后, 实验结果表明, 云边协同的调度方法在提升作业的截
止期满足率方面优于其他基线方法, 并有效地利用空闲的边缘服务器设备, 提高计算资源的利用率.
关键词: 云边协同; 深度学习训练; 集群调度; 集群管理; 截止期
中图法分类号: TP311
中文引用格式: 谷典典, 金鑫, 刘譞哲. 云边协同的深度学习作业调度方法. 软件学报, 2025, 36(12): 5480–5494. http://www.jos.org.
cn/1000-9825/7432.htm
英文引用格式: Gu DD, Jin X, Liu XZ. Cloud-edge Coordinated Scheduling Method for Deep Learning Jobs. Ruan Jian Xue
Bao/Journal of Software, 2025, 36(12): 5480–5494 (in Chinese). http://www.jos.org.cn/1000-9825/7432.htm
Cloud-edge Coordinated Scheduling Method for Deep Learning Jobs
GU Dian-Dian, JIN Xin, LIU Xuan-Zhe
(School of Computer Science, Peking University, Beijing 100871, China)
Abstract: Edge servers provide low-latency, high-performance services for mobile intelligent applications. However, due to significant
fluctuations in the load on edge servers over time, many edge servers remain idle during periods of low load, and their computational
resources are not fully utilized. In contrast to the underutilization of edge servers, computing resources in cloud computing clusters remain
relatively scarce for deep learning training tasks as artificial intelligence becomes more widely applied in daily life. Existing cluster
scheduling strategies fail to efficiently utilize idle computing resources outside of cloud computing clusters. Effectively utilizing these idle
resources can alleviate the resource constraints in cloud computing clusters, thus enabling more deadline-sensitive deep learning training
tasks to be completed before their deadlines. To address this issue, this study proposes a cluster scheduling strategy for deadline-sensitive
deep learning training tasks, which coordinates the scheduling of cloud computing resources and idle edge computing resources. This
strategy fully leverages the performance characteristics of different deep learning tasks and the availability of idle edge server devices,
allowing more deadline-sensitive tasks to be completed on time. Simulation results demonstrate that the cloud-edge collaborative
scheduling method outperforms other benchmark methods in improving the deadline satisfaction ratio and effectively utilizes idle edge
server devices.
Key words: cloud-edge coordination; deep learning training; cluster scheduling; cluster management; deadline
* 基金项目: 国家重点研发计划 (2022YFB4500700); 国家杰出青年科学基金 (62325201); 国家自然科学基金 (62172008)
收稿时间: 2024-01-08; 修改时间: 2024-12-22, 2025-02-19; 采用时间: 2025-03-23; jos 在线出版时间: 2025-07-23
CNKI 网络首发时间: 2025-07-23

