Page 7 - 《软件学报》2021年第8期
P. 7

软件学报 ISSN 1000-9825, CODEN RUXUEW                                       E-mail: jos@iscas.ac.cn
                 Journal of Software,2021,32(8):2289−2306 [doi: 10.13328/j.cnki.jos.006002]   http://www.jos.org.cn
                 ©中国科学院软件研究所版权所有.                                                         Tel: +86-10-62562563



                                                                          ∗
                 异构 HPL 算法中 CPU 端高性能 BLAS 库优化

                 蔡   雨,   孙成国,   杜朝晖,   刘子行,   康梦博,   李双双


                 (信息技术有限公司,江苏  苏州  215000)
                 通讯作者:  孙成国, E-mail: sunchengguo1@163.com


                 摘   要:  异构 HPL(high-performance Linpack)效率的提高需要充分发挥加速部件和通用 CPU 计算能力,加速部件
                 集成了更多的计算核心,负责主要的计算,通用 CPU 负责任务调度的同时也参与计算.在合理划分任务、平衡负载的
                 前提下,优化 CPU 端计算性能对整体效率的提升尤为重要.针对具体平台体系结构特点对 BLAS(basic linear algebra
                 subprograms)函数进行优化往往可以更加充分地利用通用 CPU 计算能力,提高系统整体效率.BLIS(BLAS-like
                 library instantiation  software)算法库是开源的 BLAS 函数框架,具有易开发、易移植和模块化等优点.基于异构系统
                 平台体系结构以及 HPL 算法特点,充分利用三级缓存、向量化指令和多线程并行等技术手段优化 CPU 端调用的各
                 级 BLAS 函数,应用 auto-tuning 技术优化矩阵分块参数,从而形成了异构环境下优化的 BLIS 算法库 HBLIS.与 MKL
                 相比,HPL 整体性能提高了 11.8%.
                 关键词: BLAS;遗传算法 auto-tuning;向量化指令;数据预取;多线程并行
                 中图法分类号: TP303

                 中文引用格式:  蔡雨,孙成国,杜朝晖,刘子行,康梦博,李双双.异构 HPL 算法中 CPU 端高性能 BLAS 库优化.软件学报,2021,
                 32(8):2289–2306. http.//www.jos.org.cn/1000-9825/6002.htm
                 英文引用格式:  Cai  Y,  Sun  CG,  Du  ZH, Liu ZX, Kang MB, Li SS.  CPU-side high performance BLAS  library optimization in
                 heterogeneous HPL algorithm. Ruan Jian Xue Bao/Journal of Software, 2021,32(8):2289–2306 (in Chinese). http.//www.jos.org.cn/
                 1000-9825/6002.htm

                 CPU-side High Performance BLAS Library Optimization in Heterogeneous HPL Algorithm
                 CAI Yu,  SUN Cheng-Guo,   DU Zhao-Hui,   LIU Zi-Xing,   KANG Meng-Bo,  LI Shuang-Shuang
                 (Information Technology Co., Ltd., Suzhou 215000, China)

                 Abstract:  Improving the efficiency of heterogeneous HPL needs to fully utilize the computing power of acceleration components and
                 CPU,  the acceleration components integrate more computing cores and are  responsible  for  the main calculation. The  general  CPU is
                 responsible for task scheduling and also participates in calculation. Under the premise of reasonable division of tasks and load balancing,
                 optimizing CPU-side computing performance is particularly important to improve overall efficiency. Optimizing the basic linear algebra
                 subprogram (BLAS) functions for specific platform architecture characteristics can often make full use of general-purpose CPU computing
                 capabilities to improve the overall system efficiency. The BLIS (BLAS-like library instantiation software) algorithm library is an open
                 source BLAS function framework, which has the advantages of easy development, portability, and modularity. Based on the heterogeneous
                 system platform architecture and HPL algorithm characteristics,  this  study  uses  three-level cache,  vectorized  instructions, and
                 multi-threaded parallel technology to optimize the  BLAS functions  called  by  the  CPU,  applies  auto-tuning technology to optimize the
                 matrix block parameters, and eventually forms the optimized BLIS algorithm library in heterogeneous environment. Compared with MKL,
                 the overall performance of the HPL using the optimized HBLIS has been improved by 11.8%.
                 Key words:  BLAS; genetic algorithm auto-tuning; vectorization instruction; data prefetching; multi-threading parallelization

                    BLAS(basic linear algebra subprograms)是基本线性代数子程序的缩写,是目前应用广泛的核心线性代数数


                   ∗  本文由“国产复杂异构高性能数值软件的研制与测试”专题特约编辑孙家昶研究员、李会元研究员推荐.
                     收稿时间: 2019-07-25;  修改时间: 2019-12-05, 2020-01-22, 2020-03-19;  定稿时间: 2020-03-27
   2   3   4   5   6   7   8   9   10   11   12