Page 72 - 《软件学报》2024年第6期
P. 72

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
                 Journal of Software,2024,35(6):2648−2667 [doi: 10.13328/j.cnki.jos.007098]  http://www.jos.org.cn
                 ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563



                                                                   *
                 申威众核处理器访存与通信融合编译优化

                 方燕飞,    李雁冰,    董恩铭,    王云飞,    刘    齐


                 (国家并行计算机工程技术研究中心, 北京 100190)
                 通信作者: 方燕飞, E-mail: flyyaj@163.com

                 摘 要: 申威众核片上多级存储层次是缓解众核“访存墙”的重要结构. 完全由软件管理的                        SPM  结构和片上    RMA  通信
                 机制给应用性能提升带来很多机会, 但也给应用程序开发优化与移植提出了很大挑战. 为充分挖掘片上存储层次特点
                 提升应用程序性能, 同时减轻用户编程优化负担, 提出一种多级存储层次访存与通信融合的编译优化方法. 该方法首
                 先设计融合编译指示, 将程序高层信息传递给编译器. 其次构建编译优化收益模型并设计启发式循环优化方案迭代求
                 解框架, 并由编译器完成循环优化方案的求解和优化代码的变换. 通过编译生成的                        DMA  和  RMA  批量数据传输操作,
                 results  show  that  the  program  performance  optimized  by  this  method  is  comparable  to  manual  optimization,  and  significantly  improves
                 将较低存储层次空间中高访问延迟的核心数据批量缓冲进低访问延迟的更高存储层次空间中. 在                               3  个典型测试用例
                 上进行优化实验测试与分析, 结果表明所提出的优化在性能上与手工优化相当, 较未优化版程序性能有显著提升.
                 关键词: 申威众核处理器; 多级存储层次; RMA           通信; 并行语言; 编译优化
                 中图法分类号: TP314

                 中文引用格式: 方燕飞,  李雁冰,  董恩铭,  王云飞,  刘齐.  申威众核处理器访存与通信融合编译优化.  软件学报,  2024,  35(6):
                 2648–2667. http://www.jos.org.cn/1000-9825/7098.htm
                 英文引用格式: Fang YF, Li YB, Dong EM, Wang YF, Liu Q. Memory Access and Communication Fusion Compiler Optimization for
                 Sunway Many-core Processors. Ruan Jian Xue Bao/Journal of Software, 2024, 35(6): 2648 –2667 (in Chinese). http://www.jos.org.cn/
                 1000-9825/7098.htm

                 Memory Access and Communication Fusion Compiler Optimization for Sunway Many-core
                 Processors
                 FANG Yan-Fei, LI Yan-Bing, DONG En-Ming, WANG Yun-Fei, LIU Qi
                 (National Research Center of Parallel Computer Engineering and Technology, Beijing 100190, China)
                 Abstract:  The  on-chip  memory  hierarchy  of  Sunway  many-core  processors  is  an  important  structure  to  alleviate  the  many-core  “memory
                 access  wall”.  The  SPM  structure  and  on-chip  RMA  communication  mechanism  completely  managed  by  software  bring  many  opportunities
                 for  improving  application  performance  but  also  pose  great  challenges  for  development  optimization  and  porting  of  applications.  To  fully
                 explore  the  hierarchical  features  of  on-chip  memory,  improve  application  performance,  and  reduce  the  burden  of  user  programming
                 optimization,  this  study  proposes  a  compiler  optimization  method  that  integrates  multi-level  memory  access  and  communication.  This
                 method  first  designs  a  fusion  compiler  directive  to  transfer  high-level  information  of  the  program  to  the  compiler.  Secondly,  a  compiler
                 optimization  revenue  model  is  built  and  an  iterative  solution  framework  of  a  heuristic  loop  optimization  scheme  is  designed.  Meanwhile,
                 the  compiler  completes  the  solution  and  code  transformation  of  the  loop  optimization  scheme.  DMA  and  RMA  batch  data  transmission
                 operations  are  generated  by  compilation,  batch  buffer  core  data  with  high  access  latency  from  lower  storage  hierarchy  spaces  into  higher
                 storage  hierarchy  spaces  with  low  access  latency.  Optimization  experiments  and  analysis  are  conducted  on  three  typical  test  cases,  and  the

                 compared to the unoptimized version.
                 Key words:  Sunway many-core processor; multi-level memory hierarchy; RMA communication; parallel language; compiler optimization


                 *    基金项目: 先进计算与智能工程  (国家级) 实验室基金; 国家重点研发计划重点专项       (2021YFB0301100)
                  本文由“编译技术与编译器设计”专题特约编辑冯晓兵研究员、郝丹教授、高耀清博士、左志强副教授推荐.
                  收稿时间: 2023-09-11; 修改时间: 2023-10-30; 采用时间: 2023-12-14; jos 在线出版时间: 2024-01-05
                  CNKI 网络首发时间: 2024-03-29
   67   68   69   70   71   72   73   74   75   76   77