Page 24 - 《软件学报》2021年第8期
P. 24

2306                                   Journal of Software  软件学报 Vol.32, No.8,  August 2021

                [21]    Fatica M. Accelerating  Linpack  with CUDA on heterogenous  clusters.  In: Proc. of  the 2nd Workshop on  General Purpose
                     Processing on Graphics Processing Units (GPGPU 2009). Washington, 2009. 46−51.
                [22]    Yang C, Wang F, Du Y, Chen J, Liu J, Yi H. Adaptive optimization for petascale heterogeneous CPU/GPU computing. In: Proc. of
                     the 2010 IEEE Int’l Conf. on Cluster Computing. Heraklion, 2010. 19−28.
                [23]    Yamazaki I, Tomov S, Dongarra J. One-sided dense matrix factorizations on a multicore with multiple GPU accelerators. Procedia
                     Computer Science, 2012,9(11):37−46.
                [24]    Yang C, Chen C, Tang T, Chen X, Fang J, Xue J. An energy-efficient implementation of LU factorization on heterogeneous systems.
                     In: Proc. of the IEEE 22nd Int’l Conf. on Parallel and Distributed Systems (ICPADS). Wuhan, 2016. 971−979.
                [25]    Jo G, Nah J, Lee J, Kim J, Lee J. Accelerating LINPACK with  MPI-OpenCL on  clusters of  multi-GPU nodes. IEEE  Trans. on
                     Parallel & Distributed Systems, 2015,26(7):1814−1825.
                [26]    Li J, Li X, Tan G, Chen M, Sun N. An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUs. In: Proc. of the 26th
                     ACM Int’l Conf. on Supercomputing (ICS 2012). New York, 2012. 377−386.
                [27]    Li LS, Yang WH, Ma WJ, Zhang Y, Zhao H, Zhao HT, Li HY, Sun JC. Optimization of HPL on complex heterogeneous computing
                     system. Ruan Jian Xue Bao/Journal of Software, 2021,32(8):2307−2318 (in Chinese with English abstract). http://www.jos.org.cn/
                     1000-9825/6003.htm [doi: 10.13328/j.cnki.jos.006003]
                [28]    http://www.netlib.org/benchmark/hpl/HPL_pdpanllN.html
                [29]    Sun CG, Lan J, Jiang H. Genetic algorithm for deciding blocking size parameters of GEMM in BLAS. Computer Engineering &
                     Science, 2018,40(5):798−804 (in Chinese with English abstract).
                [30]    Low T, Igual  F,  Smith T, Quintana-Orti E. Analytical modeling  is enough  for  high-performance BLIS. ACM Trans.  on
                     Mathematical Software, 2016,43(2):1−18.
                [31]    Dagum L,  Menon R. OpenMP: An industry  standard API  for  shared-memory  programming.  IEEE Computational  Science and
                     Engineering, 1998,5(1):46−55.
                [32]    https://computing.llnl.gov/tutorials/pthreads/

                 附中文参考文献:
                 [10]  顾乃杰,李凯,陈国良,吴超.基于龙芯 2F 体系结构的 BLAS 库优化.中国科学技术大学学报,2008,38(7):854−859.
                 [16]  孙家栋,孙乔,邓攀,杨超.基于申威众核处理器的 1、2 级 BLAS 函数优化研究.计算机系统应用,2017,26(11):101−108.
                 [17]  刘昊,刘芳芳,张鹏,杨超,蒋丽娟.基于申威 1600 的 3 级 BLAS GEMM 函数优化.计算机系统应用,2016,25(12):234−239.
                 [18]  郭正红,郭绍忠,许瑾晨,张兆天.异构多核平台下基础数学库寄存器分配方法.计算机应用,2014,34(S1):86−89.
                 [27]  黎雷生,杨文浩,马文静,张娅,赵慧,赵海涛,李会元,孙家昶.复杂异构计算系统 HPL 的优化.软件学报,2021,32(8):2307−2318 (in
                     Chinese with English abstract). http://www.jos.org.cn/1000-9825/6003.htm [doi: 10.13328/j.cnki.jos.006003]
                 [29]  孙成国,兰静,姜浩.一种基于遗传算法的 BLAS 库优化方法.计算机工程与科学,2018,40(5):798−804.


                              蔡雨(1988-),男,高级主管工程师,主要研                      刘子行(1977-),男,高级主管工程师,主要
                              究领域为 CPU 架构,性能优化.                            研究领域为安全软件.





                              孙成国(1985-),男,高级主管工程师,主要                      康梦博(1989-),女,高级工程师,主要研究
                              研究领域为高性能计算,性能优化.                             领域为性能优化.




                              杜朝晖(1975-),男,主任工程师,主要研究                      李双双(1984-),男,高级工程师,主要研究
                              领域为安全软件.                                     领域为数学库.
   19   20   21   22   23   24   25   26   27   28   29