Page 183 - 《软件学报》2025年第4期
P. 183

杨紫超 等: 基于性能建模的深度学习训练任务调度综述                                                      1589


                 [71]  NVIDIA. NVIDIA multi-instance GPU user guide. 2023. https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html
                 [72]  Gu DD, Xie XT, Huang G, Jin X, Liu XZ. Energy-efficient GPU clusters scheduling for deep learning. arXiv:2304.06381, 2023.
                 [73]  Zhao HY, Han ZH, Yang Z, Zhang QL, Yang F, Zhou LD, Yang M, Lau FCM, Wang YQ, Xiong YF, Wang B. HiveD: Sharing a GPU
                     cluster  for  deep  learning  with  guarantees.  In:  Proc.  of  the  14th  USENIX  Symp.  on  Operating  Systems  Design  and  Implementation.
                     USENIX, 2020. 515–532.
                 [74]  Shukla  D,  Sivathanu  M,  Viswanatha  S,  et  al.  Singularity:  Planet-scale,  preemptive  and  elastic  scheduling  of  AI  workloads.
                     arXiv:2202.07848, 2022.
                 [75]  Wang SQ, Gonzalez OJ, Zhou XB, Williams T, Friedman BD, Havemann M, Woo T. An efficient and non-intrusive GPU scheduling
                     framework for deep learning training systems. In: Proc. of the 2020 Int’l Conf. for High Performance Computing, Networking, Storage
                     and Analysis. Atlanta: IEEE, 2020. 1–3. [doi: 10.1109/SC41405.2020.00094]
                 [76]  Yeh TA, Chen HH, Chou J. KubeShare: A framework to manage GPUs as first-class and shared resources in container cloud. In: Proc. of
                     the 29th Int’l Symp. on High-performance Parallel and Distributed Computing. Stockholm: ACM, 2020. 173–184. [doi: 10.1145/3369583.
                     3392679]
                 [77]  Gu J, Song SB, Li Y, Luo HM. GaiaGPU: Sharing GPUs in container clouds. In: Proc. of the 2018 IEEE Int’l Conf. on Parallel &
                     Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing
                     &  Networking,  Sustainable  Computing  &  Communications  (ISPA/IUCC/BDCloud/SocialCom/SustainCom).  Melbourne:  IEEE,  2018.
                             吴恒(1983-), 男, 博士, 副研究员, 主要研究领
                     469–476. [doi: 10.1109/BDCloud.2018.00077]
                 [78]  Wu BY, Zhang ZL, Bai ZH, Liu XZ, Jin X. Transparent GPU sharing in container clouds for deep learning workloads. In: Proc. of the
                     20th USENIX Symp. on Networked Systems Design and Implementation. Boston: USENIX, 2023. 69–85.
                 [79]  ALIBABA. Alibaba cloud elastic GPU service best practice. 2023. https://static-aliyun-doc.oss-cn-hangzhou.aliyuncs.com/download%
                     2Fpdf%2F163835%2FBest_Practices_reseller_en-US.pdf
                 [80]  Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proc. of the
                     31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.
                 [81]  OpenAI. GPT-4 technical report. arXiv:2303.08774, 2023.
                 [82]  Baidu. ERNIE bot. 2023. https://yiyan.baidu.com/


                 附中文参考文献:
                  [4]  刘宇宸, 宗成庆. 跨模态信息融合的端到端语音翻译. 软件学报, 2023, 34(4): 1837–1849. http://www.jos.org.cn/1000-9825/6413.htm
                     [doi: 10.13328/j.cnki.jos.006413]
                 [11]  宋杰, 孙宗哲, 毛克明, 鲍玉斌, 于戈. MapReduce 大数据处理平台与算法研究进展. 软件学报, 2017, 28(3): 514–543. http://www.jos.
                     org.cn/1000-9825/5169.htm [doi: 10.13328/j.cnki.jos.005169]
                 [14]  任杰, 高岭, 于佳龙, 袁璐. 面向边缘设备的高能效深度学习任务调度策略. 计算机学报, 2020, 43(3): 440–452. [doi: 10.11897/
                     SP.J.1016.2020.00440]
                 [17]  高赫然, 吴恒, 许源佳, 李修和, 王焘, 张文博. 面向深度学习训练的内存交换机制综述. 软件学报, 2023, 34(12): 5862–5886. http://
                     www.jos.org.cn/1000-9825/6800.htm [doi: 10.13328/j.cnki.jos.006800]
                 [82]  百度. 文心一言. 2023. https://yiyan.baidu.com/


                             杨紫超(1999-), 男, 博士生, 主要研究领域为资                 吴悦文(1990-), 男, 博士, CCF  专业会员, 主要
                            源调度, 分布式系统.                                  研究领域为云计算, 容量规划.




                                                                          张文博(1976-), 男, 博士, 研究员, 博士生导师,
                            域为容器虚拟化, 边缘计算.                               主要研究领域为云计算, 服务计算.
   178   179   180   181   182   183   184   185   186   187   188