Page 93 - 《软件学报》2021年第8期
P. 93

徐顺  等:面向异构计算的高性能计算算法与软件                                                         2375


                [18]    Bagla JS. TreePM: A code for cosmological N-body simulations. Journal of Astrophysics Astronomy, 2002,23(3):185−196.
                [19]    Ishiyama T, Nitadori K, Makino J. 4.45 Pflops astrophysical N-body simulation on K computer⎯The gravitational trillion-body
                     problem. In: Proc. of the Int’l Conf. on High Performance Computing, Networking, Storage and Analysis (SC), 2012. 1−10.
                [20]    Wang YZ, Jiang JR, Zhang H, Dong X, Wang LZ, Ranjan R, Zomaya AY. A scalable parallel algorithm for atmospheric general
                     circulation models  on a multi-core cluster.  Future  Generation  Computer Systems,  2017,72:1−10. [doi: 10.1016/j.future.2017.02.
                     008]
                [21]    Strumpen V, Frigo M. IBM research report: Software engineering aspects of cache oblivious stencil computations. Research Report,
                     RC24035 (W0608-077), IBM, 2006.
                [22]    Fu HH, Liao JF, Ding N, Duan XH, Gan L, Liang YS, Wang XL, Yang JZ, Zheng Y, Liu WG, Wang LN, Yang GW. Redesigning
                     CAM-SE for peta-scale climate modeling performance and ultra-high resolution on Sunway TaihuLight. In: Proc. of the Int’l Conf.
                     on High Performance Computing, Networking, Storage and Analysis. 2017. Article No.12 [doi: 10.1145/3126908.3126909]
                [23]    Shimokawabe T, Aoki T, Takaki T, Yamanaka A, Nukada A, Endo T, Maruyama N, Matsuoka S. Peta-scale phase-field simulation
                     for dendritic solidification on the TSUBAME 2.0 supercomputer. In: Proc. of the Int’l  Conf. on High Performance  Computing,
                     Networking, Storage and Analysis (SC 2011). 2011. Article No.3.
                [24]    Zhang J, Zhou CB, Wang YG, Ju LL, Du Q, Chi XB, Xu DS, Chen DX, Liu Y, Liu Z. Extreme-scale phase field simulations of
                     coarsening dynamics on the Sunway  TaihuLight  supercomputer. In: Proc. of the Int’l  Conf. on  High Performance  Computing,
                     Networking, Storage and Analysis (SC 2016). 2016. Article No.4.
                [25]    Allen MP, Tildesley DJ. Computer Simulation of Liquids. 2nd ed., Oxford University Press, 2017.
                [26]    Brown WM, Wang P, Plimpton SJ, Tharrington AN. Implementing molecular dynamics on hybrid high performance computers⎯
                     Short range forces. Computer Physics Communications, 2011,182(4):898−911.
                [27]    Zhang S, Xu S, Liu Q, Jin Z. Cell Verlet algorithm of molecular dynamics simulation based on GPU and its parallel performance
                     analysis. Computer Science, 2018,45(10):298–301,306 (in Chinese with English abstract).
                [28]    Yasuda K. Two-electron integral evaluation on the graphics processor unit. Journal Computational Chemistry, 2008,29(3):334−342.
                [29]    Ufimtsev IS,  Martinez  TJ.  Quantum  chemistry on graphical processing units. 1. Strategies  for two-electron integral  evaluation.
                     Journal of Chemical Theory and Computation, 2008,4(2):222−231.
                [30]    Ufimtsev IS, Martinez TJ. Quantum chemistry on graphical processing units. 2. Direct self-consistent-field implementation. Journal
                     of Chemical Theory and Computation, 2009,5(4):1004−1015.
                [31]    Wilson KG. Confinement of quarks. Physical Review D, 1974(10):2445−2459.
                [32]    Brower R, Christ N, DeTar C, Edwards R, Mackenzie  P. Lattice QCD application  development  within  the US DOE exascale
                     computing project. In: Proc. of the EPJ Web Conf. 2018. Article No.09010.
                [33]    Hines J. Five Gordon bell finalists credit summit for vanguard computational science. 2019. https://www.olcf.ornl.gov/2018/09/17/
                     uncharted-territory/
                [34]    Denis C, de Oliveira Castro P, Petit E. Verificarlo: Checking floating point accuracy through Monte Carlo Arithmetic. In: Proc. of
                     the IEEE 23rd Symp. on Computer Arithmetic. 2016. 55−62.
                [35]    Aberger CR, De Sa C, Leszczynski M, et al. High-accuracy low-precision training. arXiv:1803.03383, 2018.
                [36]    Wong M. C++ single-source heterogeneous programming for OpenCL. 2020. https://www.khronos.org/sycl/
                             ®
                [37]    Smith J. Intel  oneAPI toolkits  (Beta) . 2020. https://spec.oneapi.io/versions/latest/introduction.html
                [38]    Alpay A. HIPSYCL: SYCL 1.2.1 over AMD HIP/NVIDIA CUDA. 2020. https://github.com/illuhad/hipsycl

                 附中文参考文献:
                  [3]  金钟,陆忠华,李会元,迟学斌,孙家昶.高性能计算之源起——科学计算的应用现状及发展思考.中国科学院院刊,2019,
                     34(6):625−639.
                 [13]  莫则尧,张爱清,刘青凯,曹小林.并行算法与并行编程:从个性、共性到软件复用.中国科学:信息科学,2016,46(10):1392−1410.
                 [16]  迟学斌,等.国家高性能计算环境发展报告.北京:科学出版社,2018.
                 [27]  张帅,徐顺,刘倩,金钟.基于 GPU 的分子动力学模拟 Cell Verlet 算法实现及其并行性能分析.计算机科学,2018,45(10):298−301,
                     306.
   88   89   90   91   92   93   94   95   96   97   98