Page 93 - 《软件学报》2021年第8期
P. 93
徐顺 等:面向异构计算的高性能计算算法与软件 2375
[18] Bagla JS. TreePM: A code for cosmological N-body simulations. Journal of Astrophysics Astronomy, 2002,23(3):185−196.
[19] Ishiyama T, Nitadori K, Makino J. 4.45 Pflops astrophysical N-body simulation on K computer⎯The gravitational trillion-body
problem. In: Proc. of the Int’l Conf. on High Performance Computing, Networking, Storage and Analysis (SC), 2012. 1−10.
[20] Wang YZ, Jiang JR, Zhang H, Dong X, Wang LZ, Ranjan R, Zomaya AY. A scalable parallel algorithm for atmospheric general
circulation models on a multi-core cluster. Future Generation Computer Systems, 2017,72:1−10. [doi: 10.1016/j.future.2017.02.
008]
[21] Strumpen V, Frigo M. IBM research report: Software engineering aspects of cache oblivious stencil computations. Research Report,
RC24035 (W0608-077), IBM, 2006.
[22] Fu HH, Liao JF, Ding N, Duan XH, Gan L, Liang YS, Wang XL, Yang JZ, Zheng Y, Liu WG, Wang LN, Yang GW. Redesigning
CAM-SE for peta-scale climate modeling performance and ultra-high resolution on Sunway TaihuLight. In: Proc. of the Int’l Conf.
on High Performance Computing, Networking, Storage and Analysis. 2017. Article No.12 [doi: 10.1145/3126908.3126909]
[23] Shimokawabe T, Aoki T, Takaki T, Yamanaka A, Nukada A, Endo T, Maruyama N, Matsuoka S. Peta-scale phase-field simulation
for dendritic solidification on the TSUBAME 2.0 supercomputer. In: Proc. of the Int’l Conf. on High Performance Computing,
Networking, Storage and Analysis (SC 2011). 2011. Article No.3.
[24] Zhang J, Zhou CB, Wang YG, Ju LL, Du Q, Chi XB, Xu DS, Chen DX, Liu Y, Liu Z. Extreme-scale phase field simulations of
coarsening dynamics on the Sunway TaihuLight supercomputer. In: Proc. of the Int’l Conf. on High Performance Computing,
Networking, Storage and Analysis (SC 2016). 2016. Article No.4.
[25] Allen MP, Tildesley DJ. Computer Simulation of Liquids. 2nd ed., Oxford University Press, 2017.
[26] Brown WM, Wang P, Plimpton SJ, Tharrington AN. Implementing molecular dynamics on hybrid high performance computers⎯
Short range forces. Computer Physics Communications, 2011,182(4):898−911.
[27] Zhang S, Xu S, Liu Q, Jin Z. Cell Verlet algorithm of molecular dynamics simulation based on GPU and its parallel performance
analysis. Computer Science, 2018,45(10):298–301,306 (in Chinese with English abstract).
[28] Yasuda K. Two-electron integral evaluation on the graphics processor unit. Journal Computational Chemistry, 2008,29(3):334−342.
[29] Ufimtsev IS, Martinez TJ. Quantum chemistry on graphical processing units. 1. Strategies for two-electron integral evaluation.
Journal of Chemical Theory and Computation, 2008,4(2):222−231.
[30] Ufimtsev IS, Martinez TJ. Quantum chemistry on graphical processing units. 2. Direct self-consistent-field implementation. Journal
of Chemical Theory and Computation, 2009,5(4):1004−1015.
[31] Wilson KG. Confinement of quarks. Physical Review D, 1974(10):2445−2459.
[32] Brower R, Christ N, DeTar C, Edwards R, Mackenzie P. Lattice QCD application development within the US DOE exascale
computing project. In: Proc. of the EPJ Web Conf. 2018. Article No.09010.
[33] Hines J. Five Gordon bell finalists credit summit for vanguard computational science. 2019. https://www.olcf.ornl.gov/2018/09/17/
uncharted-territory/
[34] Denis C, de Oliveira Castro P, Petit E. Verificarlo: Checking floating point accuracy through Monte Carlo Arithmetic. In: Proc. of
the IEEE 23rd Symp. on Computer Arithmetic. 2016. 55−62.
[35] Aberger CR, De Sa C, Leszczynski M, et al. High-accuracy low-precision training. arXiv:1803.03383, 2018.
[36] Wong M. C++ single-source heterogeneous programming for OpenCL. 2020. https://www.khronos.org/sycl/
®
[37] Smith J. Intel oneAPI toolkits (Beta) . 2020. https://spec.oneapi.io/versions/latest/introduction.html
[38] Alpay A. HIPSYCL: SYCL 1.2.1 over AMD HIP/NVIDIA CUDA. 2020. https://github.com/illuhad/hipsycl
附中文参考文献:
[3] 金钟,陆忠华,李会元,迟学斌,孙家昶.高性能计算之源起——科学计算的应用现状及发展思考.中国科学院院刊,2019,
34(6):625−639.
[13] 莫则尧,张爱清,刘青凯,曹小林.并行算法与并行编程:从个性、共性到软件复用.中国科学:信息科学,2016,46(10):1392−1410.
[16] 迟学斌,等.国家高性能计算环境发展报告.北京:科学出版社,2018.
[27] 张帅,徐顺,刘倩,金钟.基于 GPU 的分子动力学模拟 Cell Verlet 算法实现及其并行性能分析.计算机科学,2018,45(10):298−301,
306.