Page 46 - 《软件学报》2021年第8期
        P. 46
     2328                                   Journal of Software  软件学报 Vol.32, No.8,  August 2021
                [11]    Mittal S, Vetter JS. A Survey of CPU-GPU heterogeneous computing techniques. ACM Computing Surveys, 2015,47(4):Article 69.
                [12]    Tomov  S,  Dongarra  J, Baboulin M. Towards  dense  linear algebra  for hybrid GPU accelerated manycore systems.  Parallel
                     Computing, 2010,36(5-6):232−240.
                [13]    Tan GM, Li LC, Triechle S, Phillips E, Bao YG, Sun NH. Fast implementation of DGEMM on Fermi GPU. In: Proc. of the 2011
                     ACM/IEEE Int’l Conf. for High Performance Computing, Networking, Storage and Analysis (SC). IEEE, 2011.1−11.
                [14]    Dongarra J, van de Geijn R, Walker D. Scalability issues affecting the design of a dense linear algebra library. Journal of Parallel
                     and Distributed Computing, 1994,22(3):523−537.
                [15]    Li JJ, Li XJ, Tang GM. Research of DGEMM performance on CPU/ATI GPU heterogeneous architecture. Information Technology
                     Letter, 2011,9(6):12−27 (in Chinese).
                [16]    Chou CY, Chang HY, Wang ST, Huang KC, Shen CY. An Improved Model for Predicting HPL Performance. In: Proc. of the 2nd
                     Int’l Conf. on Advances in grid and pervasive computing. Springer-Verlag, 2007. 158−168.
                [17]    Yu XZ. An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs [MS. Thesis]. Beijing: Institute of Computing
                     Technology, Chinese Academy of Sciences, 2019 (in Chinese with English abstract).
                [18]    Wang S, Qi FB, Gu HF, Pan Z. Linpack parallel performance model and its prediction. Computer Engineering, 2012,38(16):81−84
                     (in Chinese with English abstract).
                [19]    Zhang WL, Chen MY, Fan JP. Emulation and Forecast of HPL Test Performance. Journal of Computer Research and Development,
                     2006,43(3):557−562 (in Chinese with English abstract).
                [20]    Summit. 2019. https://www.olcf.ornl.gov/summit/
                [21]    Sierra. 2019. https://hpc.llnl.gov/hardware/platforms/sierra
                [22]    ABCI. 2019. http://abci.ai/
                 附中文参考文献:
                 [15]  李佳佳,李兴建,谭光明.CPU/ATI GPU 混合体系结构上 DGEMM 的性能研究.信息技术快报,2011,9(6):12−27.
                 [17]  于献智.面向 E 级计算的大规模 HPL 算法设计与实现[硕士学位论文].北京:中国科学院计算技术研究所,2019.
                 [18]  王申,漆锋滨,谷洪峰,潘治.Linpack 并行性能模型及其预测.计算机工程,2012,38(16):81−84.
                 [19]  张文力,陈明宇,樊建平.HPL 测试性能仿真与预测.计算机研究与发展,2006,43(3):557−562.
                              水超洋(1994-),男,博士生,主要研究领域                      王银山(1988-),男,博士,副研究员,CCF
                              为稠密矩阵乘法优化,稀疏张量优化.                            专业会员,主要研究领域为数值模拟,大规
                                                                           模并行计算,稀疏矩阵计算优化.
                              于献智(1994-),男,硕士,主要研究领域为                      谭光明(1980-),男,博士,研究员,博士生
                              异构高性能计算.                                     导师,CCF 高级会员,主要研究领域为并行
                                                                           算法设计与分析,并行编程和优化,计算机
                                                                           体系结构,生物信息学,大数据.





