Page 181 - 《软件学报》2025年第4期

P. 181

杨紫超等: 基于性能建模的深度学习训练任务调度综述 1587

learning. In: Proc. of the 16th USENIX Symp. on Networked Systems Design and Implementation. Boston: USENIX, 2019. 485–500.
[28] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectionals for language understanding. In: Proc. of the 2019
Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis:
ACL, 2018. 4171–4186. [doi: 10.18653/v1/N19-1423]
[29] Yang ZC, Wu H, Xu YJ, Wu YW, Zhong H, Zhang WB. Hydra: Deadline-aware and efficiency-oriented scheduling for deep learning
jobs on heterogeneous GPUs. IEEE Trans. on Computers, 2023, 72(8): 2224–2236. [doi: 10.1109/TC.2023.3242200]
[30] Le TN, Sun X, Chowdhury M, Liu ZH. AlloX: Compute allocation in hybrid clusters. In: Proc. of the 15th European Conf. on Computer
Systems. Heraklion: ACM, 2020. 31. [doi: 10.1145/3342195.3387547]
[31] Zheng HY, Xu F, Chen L, Zhou Z, Liu FM. Cynthia: Cost-efficient cloud resource provisioning for predictable distributed deep neural
network training. In: Proc. of the 48th Int’l Conf. on Parallel Processing. Kyoto: ACM, 2019. 86. [doi: 10.1145/3337821.3337873]
[32] Mohan J, Phanishayee A, Kulkarni J, Chidambaram V. Looking beyond GPUs for DNN scheduling on multi-tenant clusters. In: Proc. of
the 16th USENIX Symp. on Operating Systems Design and Implementation. Carlsbad: USENIX, 2022. 579–596.
[33] Peng YH, Bao YX, Chen YR, Wu C, Guo CX. Optimus: An efficient dynamic resource scheduler for deep learning clusters. In: Proc. of
the 13th EuroSys Conf. Porto: ACM, 2018. 3. [doi: 10.1145/3190508.3190517]
[34] Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proc. of the 2016

IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 2818–2826. [doi: 10.1109/CVPR.2016.308]
[35] Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. In: Proc. of the 27th Int’l Conf. on Neural
Information Processing Systems. Montreal: MIT Press, 2014. 3104–3112.
[36] Zheng PF, Pan R, Khan T, Venkataraman S, Akella A. Shockwave: Fair and efficient cluster scheduling for dynamic adaptation in
machine learning. In: Proc. of the 20th USENIX Symp. on Networked Systems Design and Implementation. Boston: USENIX, 2023.
703–723.
[37] Agarwal S, Wang HY, Lee K, Venkataraman S, Papailiopoulos D. Adaptive gradient communication via critical learning regime
identification. In: Proc. of the 4th Machine Learning and Systems. MLSys, 2021. 55–80.
[38] Qin HY, Rajbhandari S, Ruwase O, Yan F, Yang L, He YX. SimiGrad: Fine-grained adaptive batching for large scale training using
gradient similarity measurement. In: Proc. of the 34th Int’l Conf. on Neural Information Processing Systems. NeurIPS, 2021.
20531–20544.
[39] Zhu HY, Phanishayee A, Pekhimenko G. Daydream: Accurately estimating the efficacy of optimizations for DNN training. In: Proc. of
the 2020 USENIX Annual Technical Conf. USENIX, 2020. 337–352.
[40] Lam MO, Hollingsworth JK, De Supinski BR, Legendre MP. Automatically adapting programs for mixed-precision floating-point
computation. In: Proc. of the 27th Int’l ACM Conf. on Int’l Conf. on Supercomputing. Eugene: ACM, 2013. 369–378. [doi: 10.1145/
2464996.2465018]
[41] Niu W, Guan JX, Wang YZ, Agrawal G, Ren B. DNNFusion: Accelerating deep neural networks execution with advanced operator
fusion. In: Proc. of the 42nd ACM SIGPLAN Int’l Conf. on Programming Language Design and Implementation. ACM, 2021. 883–898.
[doi: 10.1145/3453483.3454083]
[42] Duan JF, Li XH, Xu P, Zhang XC, Yan SG, Liang Y, Lin DH. Proteus: Simulating the performance of distributed DNN training.
arXiv:2306.02267, 2023.
[43] Hu QH, Sun P, Yan SG, Wen YG, Zhang TW. Characterization and prediction of deep learning workloads in large-scale GPU
datacenters. In: Proc. of the 2021 Int’l Conf. for High Performance Computing, Networking, Storage and Analysis. St. Louis: ACM,
2021. 104. [doi: 10.1145/3458817.3476223]
[44] Bao YX, Peng YH, Wu C. Deep learning-based job placement in distributed machine learning clusters. In: Proc. of the 2019 IEEE Conf.
on Computer Communications. Paris: IEEE, 2019. 505–513. [doi: 10.1109/INFOCOM.2019.8737460]
[45] Graves A, Fernández S, Gomez F, Schmidhuber J. Connectionist temporal classification: Labelling unsegmented sequence data with
recurrent neural networks. In: Proc. of the 23rd Int’l Conf. on Machine Learning. Pittsburgh: ACM, 2006. 369–376. [doi: 10.1145/
1143844.1143891]
[46] Chen ZY, Quan W, Wen M, Fang JB, Yu J, Zhang CY, Luo L. Deep learning research and development platform: Characterizing and
scheduling with QoS guarantees on GPU clusters. IEEE Trans. on Parallel and Distributed Systems, 2020, 31(1): 34–50. [doi: 10.1109/
TPDS.2019.2931558]
[47] Steinberg D, Colla P. CART: Classification and regression trees. The Top Ten Algorithms in Data Mining, 2009, 9: 179.
[48] Yeung G, Borowiec D, Yang RY, Friday A, Harper R, Garraghan P. Horus: Interference-aware and prediction-based scheduling in deep
learning systems. IEEE Trans. on Parallel and Distributed Systems, 2022, 33(1): 88–100. [doi: 10.1109/TPDS.2021.3079202]

176 177 178 179 180 181 182 183 184 185 186