Page 69 - 《软件学报》2021年第8期

P. 69

刘芳芳等:国产异构系统上的 HPCG 并行算法及高效实现 2351

[17] Kumahata K, Minami K, Maruyama N. High-performance conjugate gradient performance improvement on the K computer. The
Int’l Journal of High Performance Computing Applications, 2016,30(1):55−70.
[18] Phillips E, Fatica M. A CUDA implementation of the high performance conjugate gradient benchmark. In: Proc. of the Int’l
Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems. Cham: Springer-
Verlag, 2014. 68−84.
[19] Liu YQ, Zhang XY, Yang C, et al. Accelerating HPCG on Tianhe-2: A hybrid CPU-MIC algorithm. In: Proc. of the 20th IEEE Int’l
Conf. on Parallel and Distributed Systems (ICPADS). IEEE, 2014. 542−551.
[20] Liu YQ. Research on key technologies of communication intensive kernels for Intel MIC architecture [Ph.D. Thesis]. Beijing:
Institute of Software, Chinese Academy of Sciences, 2015 (in Chinese with English abstract).
[21] Ao YL. Research on key optimizations of sparse matrix and stencil computation for the domestic large many-core system [Ph.D.
Thesis]. Beijing: Institute of Software, Chinese Academy of Sciences, 2017 (in Chinese with English abstract).
[22] Ruiz D, Mantovani F, Casas M, et al. The HPCG benchmark: Analysis, shared memory preliminary improvements and evaluation
on an Arm-based platform. 2018. https://upcommons.upc.edu/bitstream/handle/2117/116642/1HPCG_shared_mem_implementa
tion_tech_report.pdf?sequence=8&isAllowed=y
[23] Greathouse JL, Daga M. Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In: Proc. of the Int’l
Conf. for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 2014. 769−780.
[24] Jones MT, Plassmann PE. A parallel graph coloring heuristic. SIAM Journal on Scientific Computing, 1993,14(3):654−669.
[25] Cohen J, Castonguay P. Efficient graph matching and coloring on the GPU. In: Proc. of the GPU Technology Conf. 2012. 1−10.
附中文参考文献:
[20] 刘益群.MIC 众核架构通信密集型函数的算法设计与性能优化研究[博士学位论文].北京:中国科学院软件研究所,2015.
[21] 敖玉龙.国产大型众核系统上稀疏矩阵和 Stencil 运算的性能优化关键技术研究[博士学位论文].北京:中国科学院软件研究
所,2017.

刘芳芳(1982－),女,博士,正高级工程师, 马文静(1981－),女,博士,副研究员,CCF 专
CCF 专业会员,主要研究领域为高性能扩业会员,主要研究领域为高性能计算.
展数学库 , 稀疏迭代解法器 , 异构众核
并行.

王志军(1995－),男,硕士,主要研究领域为杨超(1979－),男,博士,研究员,博士生导
高性能计算,并行计算. 师,CCF 高级会员,主要研究领域为高性能
计算,科学与工程计算.

汪荃(1996－),女,硕士,主要研究领域为并孙家昶(1942－),男,研究员,博士生导师,
行计算. 主要研究领域为科学与工程计算的方法、
理论与应用,并行计算.

吴丽鑫(1994－),女,硕士,主要研究领域为
并行计算.

64 65 66 67 68 69 70 71 72 73 74