Page 98 - 《软件学报》2020年第10期
P. 98

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
         Journal of Software,2020,31(10):3074−3086 [doi: 10.13328/j.cnki.jos.006071]   http://www.jos.org.cn
         ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563


                                                             ∗
         一种神经网络指令集扩展与代码映射机制

         娄文启,   王   超,   宫   磊,   周学海


         (中国科学技术大学  计算机科学与技术学院,安徽  合肥  230027)
         通讯作者:  王超, E-mail: cswang@ustc.edu.cn

         摘   要:  近年来,卷积神经网络(CNN)在图像识别和分类领域的高精度表现使其在机器学习领域受到了广泛关注.
         然而 CNN 的计算与访存密集特性给需要支持各种负载的通用处理器带来了巨大压力.因此,涌现了大量 CNN 专用
         硬件加速器.它们虽然提高了效率但却缺乏灵活性.基于新兴的 RISC-V 架构设计了包含 10 条矩阵指令的专用指令
         集 RV-CNN.通过抽象典型 CNN 中的计算为指令,该指令集可灵活支持 CNN 推理过程并具有比通用 ISA 更高的代
         码密度.在此基础上,提出了代码至指令的映射机制.通过在 Xilinx ZC702 上使用该指令集构建不同网络模型后发
         现,相比于 x86 处理器,RV-CNN 平均具有 141 倍的能效和 8.91 倍的代码密度;相比于 GPU,平均具有 1.25 倍的能效
         和 1.95 倍的代码密度.另外,相比于以往的 CNN 加速器,该设计在支持典型 CNN 模型的同时仍具有不错的能效.
         关键词:  卷积神经网络;特定领域指令;RISC-V;代码映射;现场可编程门阵列
         中图法分类号: TP306

         中文引用格式:  娄文启,王超,宫磊,周学海.一种神经网络指令集扩展与代码映射机制.软件学报,2020,31(10):3074−3086.
         http://www.jos.org.cn/1000-9825/6071.htm
         英文引用格式: Lou WQ, Wang C, Gong L, Zhou XH. Neural network instruction set extension and code mapping mechanism.
         Ruan Jian Xue Bao/Journal of Software, 2020,31(10):3074−3086 (in Chinese). http://www.jos.org.cn/1000-9825/6071.htm
         Neural Network Instruction Set Extension and Code Mapping Mechanism

         LOU Wen-Qi,  WANG Chao,   GONG Lei,   ZHOU Xue-Hai
         (School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China)

         Abstract:    In recent years, due to the high-accuracy performance of Convolutional Neural Network (CNN) in character recognition and
         image  classification,  it has received widespread  attention in the field of  machine learning. Nevertheless,  the  compute-intensive and
         memory-intensive characteristics of CNN have posed huge challenges to the general-purpose processor, which needs to support various
         workloads.  Therefore,  a large number of  CNN-specific hardware  accelerators have  emerged to improve  efficiency. Whereas,  although
         previous accelerators are  significantly efficient, they  usually  lack  flexibility.  In  this  study, classical CNN models are analyzed and a
         domain-specific instruction set of 10  matrix instructions,  called  RV-CNN, is design based on the promising  RISC-V  architecture. By
         abstracting CNN computation into instructions, the proposed design can provide sufficient flexibility for CNN and possesses a higher
         code density than the general ISA. Based on this, a code-to-instruction mapping mechanism is proposed. By using the RV-CNN to build
         different CNN models on the Xilinx ZC702, it was found that compared to x86 processors, RV-CNN has an average of 141 times energy
         efficiency and 8.91 times the code density; compared to GPU, it has an average of 1.25 times energy efficiency and 1.95 times the code
         density. Besides, compared to previous CNN accelerators, the design supports typical CNN models while having good energy efficiency.

            ∗ 基金项目:  国家重点研发计划(2017YFA0700900, 2017YFA0700903);  国家自然科学基金(61379040);  江苏省自然科学基金
         (BK20181193);  中国科学院青年创新促进会资助项目(2017497)
             Foundation  item: National Key Research and Development  Program  of China (2017YFA0700900,  2017YFA0700903);  National
         Natural Science Foundation of  China (61379040);  Natural Science Foundation of    Jiangsu    Province,  China (BK20181193);  Youth
         Innovation Promotion Association CAS (2017497)
              本文由“系统软件前沿进展”专题特约编辑武延军研究员、陈海波教授、包云岗研究员、李玲研究员推荐.
              收稿时间: 2020-02-16;  修改时间: 2020-04-04;  采用时间: 2020-05-09; jos 在线出版时间: 2020-06-10
   93   94   95   96   97   98   99   100   101   102   103