Page 70 - 《软件学报》2021年第8期
P. 70
软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software,2021,32(8):2352−2364 [doi: 10.13328/j.cnki.jos.006007] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563
∗
SW26010 众核任务并行调度系统及其嵌套并行算法应用
孙 乔, 黎雷生, 赵海涛, 赵 慧, 吴长茂
(中国科学院 软件研究所 并行软件与计算科学实验室,北京 100190)
通讯作者: 吴长茂, E-mail: changmaowu@foxmail.com
摘 要: 任务并行是并行程序设计的基础设计模式.但由于算法本身的复杂性及目标平台的特殊性,设计实现高
效率的任务并行程序对程序员来说往往充满挑战.基于新兴的SW26010众核CPU,提出了支持任务嵌套并行模式的
通用运行时框架 SWAN.SWAN 对任务并行程序的实现提供了高层次的抽象,使程序员能够专注于算法逻辑本身而
提高开发效率.在性能方面,SWAN 框架对诸多共享资源进行了细粒度的划分,从而有效地避免了众多线程间对共享
资源的高强度争用.充分利用平台的高速访存机制、高速可控缓存和原子操作等特性,对 SWAN 框架的核心数据结
构进行优化设计以降低其本身的性能开销.SWAN 还具备动态负载均衡能力,使各个处理器核心的资源得以充分利
用.基于 SWAN 框架,在目标平台上实现了若干典型的具有递归特性的嵌套并行算法,包括 N-皇后问题、二叉树遍
历、快速排序和凸包求解.实验结果表明,这些通过使用 SWAN 框架得以并行化的算法相对于其串行版本取得了
4.5~32 倍的加速,充分说明了 SWAN 框架具有较高的实用性及性能.
关键词: 任务并行框架;并行计算;嵌套并行算法;SWAN;SW26010 众核 CPU
中图法分类号: TP303
中文引用格式: 孙乔,黎雷生,赵海涛,赵慧,吴长茂.SW26010 众核任务并行调度系统及其嵌套并行算法应用.软件学报,2021,
32(8):2352–2364. http://www.jos.org.cn/1000-9825/6007.htm
英文引用格式: Sun Q, Li LS, Zhao HT, Zhao H, Wu CM. Task parallel framework and its application in nested parallel
algorithms on the SW26010 many-core platform. Ruan Jian Xue Bao/Journal of Software, 2021,32(8):2352−2364 (in Chinese).
http://www.jos.org.cn/1000-9825/6007.htm
Task Parallel Framework and Its Application in Nested Parallel Algorithms on the SW26010
Many-core Platform
SUN Qiao, LI Lei-Sheng, ZHAO Hai-Tao, ZHAO Hui, WU Chang-Mao
(Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China)
Abstract: Task parallelism is one of the fundamental patterns for designing parallel algorithms. Due to algorithm complexity and
distinctive hardware features, however, implementation of algorithms in task parallelism often remains to be challenging. On the newly
SW26010 many-core CPU platform, a general runtime framework, SWAN, which supports nested task parallelism is proposed in this
study. SWAN provides high-level abstractions for programmers to implement task parallelism so that they can focus mainly on the
algorithm itself, enjoying an enhanced productivity. In the aspect of performance, the shared resources and information manipulated by
SWAN are partitioned in a fine-grained manner to avoid fierce contention among working threads. The core data structures within SWAN
take advantage of the high-bandwidth memory access mechanism, fast on-chip scratchpad cache as well as atomic operations of the
platform to reduce the overhead of SWAN itself. Besides, SWAN provides dynamic load-balancing strategies in runtime to ensure a full
occupation of the threads. In the experiment, a set of recursive algorithms in nested parallelism, including the N-queens problem,
binary-tree traversal, quick sort, and convex hull, are implemented using SWAN on the target platform. The experimental results reveal
∗ 基金项目: 中国科学院战略性先导科技专项(C 类)(XDC01030200)
Foundation item: Strategic Priority Research Program of the Chinese Academy of Sciences (Category C) (XDC01030200)
本文由“国产复杂异构高性能数值软件的研制与测试”专题特约编辑孙家昶研究员、李会元研究员推荐.
收稿时间: 2019-08-22; 修改时间: 2019-12-05; 定稿时间: 2020-01-22