Page 69 - 《软件学报》2021年第8期
P. 69

刘芳芳  等:国产异构系统上的 HPCG 并行算法及高效实现                                                  2351


                [17]    Kumahata K, Minami K, Maruyama N. High-performance conjugate gradient performance improvement on the K computer. The
                     Int’l Journal of High Performance Computing Applications, 2016,30(1):55−70.
                [18]    Phillips E, Fatica  M.  A  CUDA  implementation of  the high performance conjugate gradient benchmark. In:  Proc. of the  Int’l
                     Workshop on Performance  Modeling,  Benchmarking  and Simulation of  High Performance Computer Systems.  Cham: Springer-
                     Verlag, 2014. 68−84.
                [19]    Liu YQ, Zhang XY, Yang C, et al. Accelerating HPCG on Tianhe-2: A hybrid CPU-MIC algorithm. In: Proc. of the 20th IEEE Int’l
                     Conf. on Parallel and Distributed Systems (ICPADS). IEEE, 2014. 542−551.
                [20]    Liu YQ. Research  on  key technologies  of communication intensive  kernels for  Intel MIC architecture  [Ph.D. Thesis]. Beijing:
                     Institute of Software, Chinese Academy of Sciences, 2015 (in Chinese with English abstract).
                [21]    Ao YL. Research on key optimizations of sparse matrix and stencil computation for the domestic large many-core system [Ph.D.
                     Thesis]. Beijing: Institute of Software, Chinese Academy of Sciences, 2017 (in Chinese with English abstract).
                [22]    Ruiz D, Mantovani F, Casas M, et al. The HPCG benchmark: Analysis, shared memory preliminary improvements and evaluation
                     on an Arm-based  platform.  2018. https://upcommons.upc.edu/bitstream/handle/2117/116642/1HPCG_shared_mem_implementa
                     tion_tech_report.pdf?sequence=8&isAllowed=y
                [23]    Greathouse JL, Daga M. Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In: Proc. of the Int’l
                     Conf. for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 2014. 769−780.
                [24]    Jones MT, Plassmann PE. A parallel graph coloring heuristic. SIAM Journal on Scientific Computing, 1993,14(3):654−669.
                [25]    Cohen J, Castonguay P. Efficient graph matching and coloring on the GPU. In: Proc. of the GPU Technology Conf. 2012. 1−10.
                 附中文参考文献:
                 [20]  刘益群.MIC 众核架构通信密集型函数的算法设计与性能优化研究[博士学位论文].北京:中国科学院软件研究所,2015.
                 [21]  敖玉龙.国产大型众核系统上稀疏矩阵和 Stencil 运算的性能优化关键技术研究[博士学位论文].北京:中国科学院软件研究
                     所,2017.


                              刘芳芳(1982-),女,博士,正高级工程师,                      马文静(1981-),女,博士,副研究员,CCF 专
                              CCF 专业会员,主要研究领域为高性能扩                         业会员,主要研究领域为高性能计算.
                              展数学 库 , 稀疏 迭代解 法器 , 异构 众核
                              并行.

                              王志军(1995-),男,硕士,主要研究领域为                      杨超(1979-),男,博士,研究员,博士生导
                              高性能计算,并行计算.                                  师,CCF 高级会员,主要研究领域为高性能
                                                                           计算,科学与工程计算.



                              汪荃(1996-),女,硕士,主要研究领域为并                      孙家昶(1942-),男,研究员,博士生导师,
                              行计算.                                         主要研究领域为科学与工程计算的方法、
                                                                           理论与应用,并行计算.



                              吴丽鑫(1994-),女,硕士,主要研究领域为
                              并行计算.
   64   65   66   67   68   69   70   71   72   73   74