Page 180 - 《软件学报》2025年第4期
P. 180

1586                                                       软件学报  2025  年第  36  卷第  4  期


                     and Applications, 2023, 82(3): 3713–3744. [doi: 10.1007/s11042-022-13428-4]
                  [6]  Weng QZ, Xiao WC, Yu YH, Wang W, Wang C, He J, Li Y, Zhang LP, Lin W, Ding Y. MLaaS in the wild: Workload analysis and
                     scheduling  in  large-scale  heterogeneous  GPU  clusters.  In:  Proc.  of  the  19th  USENIX  Symp.  on  Networked  Systems  Design  and
                     Implementation. Renton: USENIX, 2022. 945–960.
                  [7]  Jeon M, Venkataraman S, Phanishayee A, Qian JJ, Xiao WC, Yang F. Analysis of large-scale multi-tenant GPU clusters for DNN training
                     workloads. In: Proc. of the 2019 USENIX Annual Technical Conf. Renton: USENIX, 2019. 947–960.
                  [8]  Rico-Gallego  JA,  Díaz-Martín  JC,  Manumachu  RR,  Lastovetsky  AL.  A  survey  of  communication  performance  models  for  high-
                     performance computing. ACM Computing Surveys, 2019, 51(6): 126. [doi: 10.1145/3284358]
                  [9]  Reuther A, Byun C, Arcand W, Bestor D, Bergeron B, Hubbell M, Jones M, Michaleas P, Prout A, Rosa A, Kepner J. Scalable system
                     scheduling for HPC and big data. Journal of Parallel and Distributed Computing, 2018, 111: 76–92. [doi: 10.1016/j.jpdc.2017.06.009]
                 [10]  Netto MAS, Calheiros RN, Rodrigues ER, Cunha RLF, Buyya R. HPC cloud for scientific and business applications: Taxonomy, vision,
                     and research challenges. ACM Computing Surveys, 2019, 51(1): 8. [doi: 10.1145/3150224]
                 [11]  Song J, Sun ZZ, Mao KM, Bao YB, Yu G. Research advance on MapReduce based big data processing platforms and algorithms. Ruan
                     Jian Xue Bao/Journal of Software, 2017, 28(3): 514–543 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5169.htm
                     [doi: 10.13328/j.cnki.jos.005169]
                 [12]  Yu  FX,  Wang  D,  Shangguan  LF,  Zhang  MJ,  Tang  XL,  Liu  CC,  Chen  X.  A  survey  of  large-scale  deep  learning  serving  system
                     optimization: Challenges and opportunities. arXiv:2111.14247, 2021.
                 [13]  Yu  FX,  Wang  D,  Shangguan  LF,  Zhang  MJ,  Liu  CC,  Chen  X.  A  survey  of  multi-tenant  deep  learning  inference  on  GPU.
                     arXiv:2203.09040, 2022.
                 [14]  Ren J, Gao L, Yu JL, Yuan L. Energy-efficient deep learning task scheduling strategy for edge device. Chinese Journal of Computers,
                     2020, 43(3): 440–452 (in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2020.00440]
                 [15]  Mittal S, Vaishay S. A survey of techniques for optimizing deep learning on GPUs. Journal of Systems Architecture, 2019, 99: 101635.
                     [doi: 10.1016/j.sysarc.2019.101635]
                 [16]  Rasley  J,  Rajbhandari  S,  Ruwase  O,  He  YX.  DeepSpeed:  System  optimizations  enable  training  deep  learning  models  with  over  100
                     billion parameters. In: Proc. of the 26th ACM SIGKDD Int’l Conf. on Knowledge Discovery & Data Mining. ACM, 2020. 3505–3506.
                     [doi: 10.1145/3394486.3406703]
                 [17]  Gao HR, Wu H, Xu YJ, Li XH, Wang T, Zhang WB. Survey on memory swapping mechanism for deep learning training. Ruan Jian Xue
                     Bao/Journal of Software, 2023, 34(12): 5862–5886 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6800.htm [doi:
                     10.13328/j.cnki.jos.006800]
                 [18]  Gao W, Hu QH, Ye ZS, Sun P, Wang XL, Luo YW, Zhang TW, Wen YG. Deep learning workload scheduling in GPU datacenters:
                     Taxonomy, challenges and vision. arXiv:2205.11913, 2022.
                 [19]  Mayer  R,  Jacobsen  HA.  Scalable  deep  learning  on  distributed  infrastructures:  Challenges,  techniques,  and  tools.  ACM  Computing
                     Surveys, 2021, 53(1): 3. [doi: 10.1145/3363554]
                 [20]  Gao W, Ye ZS, Sun P, Wen YG, Zhang TW. Chronus: A novel deadline-aware scheduler for deep learning training jobs. In: Proc. of the
                     2021 ACM Symp. on Cloud Computing. Seattle: ACM, 2021. 609–623. [doi: 10.1145/3472883.3486978]
                 [21]  Narayanan  D,  Santhanam  K,  Kazhamiaka  F,  Phanishayee  A,  Zaharia  M.  Heterogeneity-aware  cluster  scheduling  policies  for  deep
                     learning workloads. In: Proc. of the 14th USENIX Symp. on Operating Systems Design and Implementation. USENIX, 2020. 481–498.
                 [22]  Qiao A, Choe SK, Subramanya SJ, Neiswanger W, Ho Q, Zhang H, Ganger GR, Xing EP. Pollux: Co-adaptive cluster scheduling for
                     goodput-optimized deep learning. In: Proc. of the 15th USENIX Symp. on Operating Systems Design and Implementation. USENIX, 2021.
                 [23]  He KM, Zhang XY, Ren SQ, Sun J. Deep residual learning for image recognition. In: Proc. of the 2016 IEEE Conf. on Computer Vision
                     and Pattern Recognition. Las Vegas: IEEE, 2016. 770–778. [doi: 10.1109/CVPR.2016.90]
                 [24]  Xiao WC, Bhardwaj R, Ramjee R, Sivathanu M, Kwatra N, Han ZH, Patel P, Peng X, Zhao HY, Zhang QL, Yang F, Zhou LD. Gandiva:
                     Introspective  cluster  scheduling  for  deep  learning.  In:  Proc.  of  the  13th  USENIX  Symp.  on  Operating  Systems  Design  and
                     Implementation. Carlsbad: USENIX, 2018. 595–610.
                 [25]  Han ZH, Tan HS, Jiang SHC, Fu XM, Cao WL, Lau FCM. Scheduling placement-sensitive BSP jobs with inaccurate execution time
                     estimation.  In:  Proc.  of  the  2020  IEEE  Conf.  on  Computer  Communications.  Toronto:  IEEE,  2020.  1053–1062.  [doi:  10.1109/
                     INFOCOM41043.2020.9155445]
                 [26]  Simonyan  K,  Zisserman  A.  Very  deep  convolutional  networks  for  large-scale  image  recognition.  In:  Proc.  of  the  3rd  Int’l  Conf.  on
                     Learning Representations. San Diego: ICLR, 2015. [doi: 10.48550/arXiv.1409.1556]
                 [27]  Gu JC, Chowdhury M, Shin KG, Zhu YB, Jeon M, Qian JJ, Liu HH, Guo CX. Tiresias: A GPU cluster manager for distributed deep
   175   176   177   178   179   180   181   182   183   184   185