Page 180 - 《软件学报》2025年第4期

P. 180

1586 软件学报 2025 年第 36 卷第 4 期

and Applications, 2023, 82(3): 3713–3744. [doi: 10.1007/s11042-022-13428-4]
[6] Weng QZ, Xiao WC, Yu YH, Wang W, Wang C, He J, Li Y, Zhang LP, Lin W, Ding Y. MLaaS in the wild: Workload analysis and
scheduling in large-scale heterogeneous GPU clusters. In: Proc. of the 19th USENIX Symp. on Networked Systems Design and
Implementation. Renton: USENIX, 2022. 945–960.
[7] Jeon M, Venkataraman S, Phanishayee A, Qian JJ, Xiao WC, Yang F. Analysis of large-scale multi-tenant GPU clusters for DNN training
workloads. In: Proc. of the 2019 USENIX Annual Technical Conf. Renton: USENIX, 2019. 947–960.
[8] Rico-Gallego JA, Díaz-Martín JC, Manumachu RR, Lastovetsky AL. A survey of communication performance models for high-
performance computing. ACM Computing Surveys, 2019, 51(6): 126. [doi: 10.1145/3284358]
[9] Reuther A, Byun C, Arcand W, Bestor D, Bergeron B, Hubbell M, Jones M, Michaleas P, Prout A, Rosa A, Kepner J. Scalable system
scheduling for HPC and big data. Journal of Parallel and Distributed Computing, 2018, 111: 76–92. [doi: 10.1016/j.jpdc.2017.06.009]
[10] Netto MAS, Calheiros RN, Rodrigues ER, Cunha RLF, Buyya R. HPC cloud for scientific and business applications: Taxonomy, vision,
and research challenges. ACM Computing Surveys, 2019, 51(1): 8. [doi: 10.1145/3150224]
[11] Song J, Sun ZZ, Mao KM, Bao YB, Yu G. Research advance on MapReduce based big data processing platforms and algorithms. Ruan
Jian Xue Bao/Journal of Software, 2017, 28(3): 514–543 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5169.htm
[doi: 10.13328/j.cnki.jos.005169]
[12] Yu FX, Wang D, Shangguan LF, Zhang MJ, Tang XL, Liu CC, Chen X. A survey of large-scale deep learning serving system
optimization: Challenges and opportunities. arXiv:2111.14247, 2021.
[13] Yu FX, Wang D, Shangguan LF, Zhang MJ, Liu CC, Chen X. A survey of multi-tenant deep learning inference on GPU.
arXiv:2203.09040, 2022.
[14] Ren J, Gao L, Yu JL, Yuan L. Energy-efficient deep learning task scheduling strategy for edge device. Chinese Journal of Computers,
2020, 43(3): 440–452 (in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2020.00440]
[15] Mittal S, Vaishay S. A survey of techniques for optimizing deep learning on GPUs. Journal of Systems Architecture, 2019, 99: 101635.
[doi: 10.1016/j.sysarc.2019.101635]
[16] Rasley J, Rajbhandari S, Ruwase O, He YX. DeepSpeed: System optimizations enable training deep learning models with over 100
billion parameters. In: Proc. of the 26th ACM SIGKDD Int’l Conf. on Knowledge Discovery & Data Mining. ACM, 2020. 3505–3506.
[doi: 10.1145/3394486.3406703]
[17] Gao HR, Wu H, Xu YJ, Li XH, Wang T, Zhang WB. Survey on memory swapping mechanism for deep learning training. Ruan Jian Xue
Bao/Journal of Software, 2023, 34(12): 5862–5886 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6800.htm [doi:
10.13328/j.cnki.jos.006800]
[18] Gao W, Hu QH, Ye ZS, Sun P, Wang XL, Luo YW, Zhang TW, Wen YG. Deep learning workload scheduling in GPU datacenters:
Taxonomy, challenges and vision. arXiv:2205.11913, 2022.
[19] Mayer R, Jacobsen HA. Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools. ACM Computing
Surveys, 2021, 53(1): 3. [doi: 10.1145/3363554]
[20] Gao W, Ye ZS, Sun P, Wen YG, Zhang TW. Chronus: A novel deadline-aware scheduler for deep learning training jobs. In: Proc. of the
2021 ACM Symp. on Cloud Computing. Seattle: ACM, 2021. 609–623. [doi: 10.1145/3472883.3486978]
[21] Narayanan D, Santhanam K, Kazhamiaka F, Phanishayee A, Zaharia M. Heterogeneity-aware cluster scheduling policies for deep
learning workloads. In: Proc. of the 14th USENIX Symp. on Operating Systems Design and Implementation. USENIX, 2020. 481–498.
[22] Qiao A, Choe SK, Subramanya SJ, Neiswanger W, Ho Q, Zhang H, Ganger GR, Xing EP. Pollux: Co-adaptive cluster scheduling for
goodput-optimized deep learning. In: Proc. of the 15th USENIX Symp. on Operating Systems Design and Implementation. USENIX, 2021.
[23] He KM, Zhang XY, Ren SQ, Sun J. Deep residual learning for image recognition. In: Proc. of the 2016 IEEE Conf. on Computer Vision
and Pattern Recognition. Las Vegas: IEEE, 2016. 770–778. [doi: 10.1109/CVPR.2016.90]
[24] Xiao WC, Bhardwaj R, Ramjee R, Sivathanu M, Kwatra N, Han ZH, Patel P, Peng X, Zhao HY, Zhang QL, Yang F, Zhou LD. Gandiva:
Introspective cluster scheduling for deep learning. In: Proc. of the 13th USENIX Symp. on Operating Systems Design and
Implementation. Carlsbad: USENIX, 2018. 595–610.
[25] Han ZH, Tan HS, Jiang SHC, Fu XM, Cao WL, Lau FCM. Scheduling placement-sensitive BSP jobs with inaccurate execution time
estimation. In: Proc. of the 2020 IEEE Conf. on Computer Communications. Toronto: IEEE, 2020. 1053–1062. [doi: 10.1109/
INFOCOM41043.2020.9155445]
[26] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proc. of the 3rd Int’l Conf. on
Learning Representations. San Diego: ICLR, 2015. [doi: 10.48550/arXiv.1409.1556]
[27] Gu JC, Chowdhury M, Shin KG, Zhu YB, Jeon M, Qian JJ, Liu HH, Guo CX. Tiresias: A GPU cluster manager for distributed deep

175 176 177 178 179 180 181 182 183 184 185