Page 321 - 《软件学报》2021年第9期
P. 321

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
         Journal of Software,2021,32(9):2945−2962 [doi: 10.13328/j.cnki.jos.005978]   http://www.jos.org.cn
         ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563


                                                                ∗
         神威太湖之光上分子动力学模拟的性能优化

         田   卓,   陈一峯


         (北京大学  信息科学与技术学院,北京  100871)
         通信作者:  田卓, E-mail: t.z@pku.edu.cn

         摘   要: “神威•太湖之光”国产超级计算机的特点是适用于高通量计算系统,此类系统往往存储器访问延迟,网络
         延迟较长.在实际应用中,有一大类问题是时间演化的模拟问题,往往需要高频状态迭代,每次迭代需要通信.此类应
         用问题的典型代表是分子动力学模拟,分子的性质依赖于时间演化,导致状态相关的时间尺度上难以并行化.实际应
                                                                                             12
         用中,全原子模型需要模拟超过μs 时间尺度,每一步的物理时间为 1fs~2.5fs,这意味着所需时间步个数超过 10 个.
         众核处理器中,不同核心访存时需较长的“排队”等待,造成访存延迟.另外,网卡通信延迟以及较长的数据通路会带
         来网络延迟,由此导致在长延迟的众核处理器上进行一次有效的模拟几乎是不可能的.解决此类问题的主要挑战是
         提高迭代频率,即每秒执行尽可能多的迭代步.针对神威高性能芯片处理器的体系结构特点,以分子动力学模拟为
         例,研究了一系列优化策略以提高迭代频率:(1)  单核通信与片上核间同步相结合,降低通信成本;(2)  共享内存等待
         与从核同步相结合,优化异构体系结构中的核间同步;(3)  改变计算模式,减少核间数据关联和依赖关系;(4)  数据传
         输与计算重叠,掩盖访存延迟;(5)  规则化问题,以提高访存凝聚性.
         关键词:  神威太湖之光;分子动力学;迭代;异构;同步
         中图法分类号: TP302


         中文引用格式:  田卓,陈一峯.神威太湖之光上分子动力学模拟的性能优化.软件学报,2021,32(9):2945−2962.  http://www.jos.
         org.cn/1000-9825/5978.htm
         英文引用格式: Tian Z, Chen YF. Performance optimization of molecular dynamics simulation on Sunway TaihuLight system.
         Ruan Jian Xue Bao/Journal of Software, 2021,32(9):2945−2962 (in Chinese). http://www.jos.org.cn/1000-9825/5978.htm
         Performance Optimization of Molecular Dynamics Simulation on Sunway TaihuLight System

         TIAN Zhuo,  Chen Yi-Feng
         (School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China)

         Abstract:    Sunway  TaihuLight supercomputer is suitable for high-throughput  computing  systems,  which tend to have  memory  access
         latency and network latency. There is a large class of problems namely time-to-solution, which requires high frequency iterations. The
         typical application  of time-to-solution  problems is molecular  dynamics simulation. Computations in molecular  dynamics  simulation
         depend on the time. Therefore, the iterative computations are difficult to be parallelized. Time scale usually exceeds microsecond, which
                                       12
         means that the number of steps is more than 10 . It is impossible to finish effective simulation in a limited time on long latency system.
         Therefore, the  main performance bottleneck on long latency Sunway system  is how  to increase the iterative frequency.  This  study
         proposes  a series  of optimization strategies to improve the iterative frequency: (1)  Reducing communication overhead  and network
         competition costs through single-core  communication combined  with on-chip synchronization; (2) Optimizating the speed of
         synchronization between  cores through waiting the shared  memory variable  and synchronizing the  computing processing  elements;
         (3) Reducing the data dependencies by changing the computation patterns; (4) Covering up the memory access latency by overlapping
         computation and communication; (5) Regulating the data structure to improve accessibility.

            ∗  基金项目:  国家重点研发计划(2017YFB0202001);  国家自然科学基金(61432018, 61672208)
              Foundation item: National Key Research and Development  Program of China  (2017YFB0202001); National Natural  Science
         Foundation of China (61432018, 61672208)
              收稿时间: 2018-11-08;  修改时间: 2019-10-25;  采用时间: 2019-11-06
   316   317   318   319   320   321   322   323   324   325   326