Page 25 - 《软件学报》2021年第8期

P. 25

软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software,2021,32(8):2307−2318 [doi: 10.13328/j.cnki.jos.006003] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563

∗
复杂异构计算系统 HPL 的优化

1,2
1,2
1,2
1,2
1,2
1,2
1
1
黎雷生 , 杨文浩 , 马文静 , 张娅 , 赵慧 , 赵海涛 , 李会元 , 孙家昶
1
(中国科学院软件研究所并行软件与计算科学实验室,北京 100190)
2
(计算机科学国家重点实验室(中国科学院软件研究所),北京 100190)
通讯作者: 黎雷生, E-mail: leisheng@iscas.ac.cn

摘要: 当今世界的主流超级计算机越来越多地使用带有加速器的异构系统.随着加速器的浮点性能不断提高,
超级计算机内计算节点的 CPU、内存、总线、网络以及系统架构都要与之相适应.HPL(high performance Linpack)
是高性能计算机评测的传统基准测试程序,复杂异构系统给 HPL 评测带来很多机遇与挑战.针对带有 GPU 的异构
超级计算机系统,提出一套新的 CPU 与加速器计算任务分配方式,提出平衡点理论指导 HPL 性能优化.为了优化
HPL 程序,提出了使用 CPU 与加速器协同工作的 look-ahead 算法和行交换连续流水算法,实现了加速器、CPU、网
络等部件的高度并行.此外,为带有加速器的系统设计了新的 panel 分解和行交换的实现方法,提高了加速器的利用
率.在每个节点带有 4 个 GPU 的系统上,单节点 HPL 效率达到了 79.51%.
关键词: 复杂异构系统;平衡点理论;panel 分解加速;连续流水线算法
中图法分类号: TP303

中文引用格式: 黎雷生,杨文浩,马文静,张娅,赵慧,赵海涛,李会元,孙家昶.复杂异构计算系统 HPL 的优化.软件学报,2021,
32(8):2307–2318. http://www.jos.org.cn/1000-9825/6003.htm
英文引用格式: Li LS, Yang WH, Ma WJ, Zhang Y, Zhao H, Zhao HT, Li HY, Sun JC. Optimization of HPL on complex
heterogeneous computing system. Ruan Jian Xue Bao/Journal of Software, 2021,32(8):2307−2318 (in Chinese). http://www.jos.
org.cn/1000-9825/6003.htm

Optimization of HPL on Complex Heterogeneous Computing System

1,2
1
1,2
1
1,2
1,2
LI Lei-Sheng , YANG Wen-Hao , MA Wen-Jing , ZHANG Ya , ZHAO Hui , ZHAO Hai-Tao ,
1,2
1,2
LI Hui-Yuan , SUN Jia-Chang
1
(Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190,
China)
2
(State Key Laboratory of Computer Science (Institute of Software, Chinese Academy of Sciences), Beijing 100190, China)
Abstract: Nowadays, the mainstream supercomputers in the world adopt heterogeneous systems with accelerators more and more. The
increase of float point computation performance of the accelerators requires other components to match its speed, including CPU, memory,
bus, and network. High performance Linpack (HPL) is the traditional benchmark for high performance computers. Complex
heterogeneous systems have brought both opportunities and challenges to the benchmarking with HPL. Therefore, for heterogeneous
supercomputers, a new task partitioning scheme between the CPU and the accelerators is proposed, using the balance point theory to guide
the optimization of HPL. For optimizing HPL, a look-ahead algorithm is proposed to coordinate the collaboration of CPU and the

∗ 基金项目: 中国科学院战略性先导科技专项(C 类)(XDC01030200); 国家重点研发计划(2018YFB0204404, 2016YFB0200601);
国家自然科学基金(11871455, 11971016)
Foundation item: Strategic Priority Research Program of the Chinese Academy of Sciences (Category C) (XDC01030200); National
Key Research and Development Program of China (2018YFB0204404, 2016YFB0200601); National Natural Science Foundation of China
(11871455, 11971016)
本文由“国产复杂异构高性能数值软件的研制与测试”专题特约编辑孙家昶研究员、李会元研究员推荐.
收稿时间: 2019-08-20; 修改时间: 2019-12-05; 定稿时间: 2020-01-22

20 21 22 23 24 25 26 27 28 29 30