Page 98 - 《软件学报》2020年第10期

P. 98

软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software,2020,31(10):3074−3086 [doi: 10.13328/j.cnki.jos.006071] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563

∗
一种神经网络指令集扩展与代码映射机制

娄文启, 王超, 宫磊, 周学海

(中国科学技术大学计算机科学与技术学院,安徽合肥 230027)
通讯作者: 王超, E-mail: cswang@ustc.edu.cn

摘要: 近年来,卷积神经网络(CNN)在图像识别和分类领域的高精度表现使其在机器学习领域受到了广泛关注.
然而 CNN 的计算与访存密集特性给需要支持各种负载的通用处理器带来了巨大压力.因此,涌现了大量 CNN 专用
硬件加速器.它们虽然提高了效率但却缺乏灵活性.基于新兴的 RISC-V 架构设计了包含 10 条矩阵指令的专用指令
集 RV-CNN.通过抽象典型 CNN 中的计算为指令,该指令集可灵活支持 CNN 推理过程并具有比通用 ISA 更高的代
码密度.在此基础上,提出了代码至指令的映射机制.通过在 Xilinx ZC702 上使用该指令集构建不同网络模型后发
现,相比于 x86 处理器,RV-CNN 平均具有 141 倍的能效和 8.91 倍的代码密度;相比于 GPU,平均具有 1.25 倍的能效
和 1.95 倍的代码密度.另外,相比于以往的 CNN 加速器,该设计在支持典型 CNN 模型的同时仍具有不错的能效.
关键词: 卷积神经网络;特定领域指令;RISC-V;代码映射;现场可编程门阵列
中图法分类号: TP306

中文引用格式: 娄文启,王超,宫磊,周学海.一种神经网络指令集扩展与代码映射机制.软件学报,2020,31(10):3074−3086.
http://www.jos.org.cn/1000-9825/6071.htm
英文引用格式: Lou WQ, Wang C, Gong L, Zhou XH. Neural network instruction set extension and code mapping mechanism.
Ruan Jian Xue Bao/Journal of Software, 2020,31(10):3074−3086 (in Chinese). http://www.jos.org.cn/1000-9825/6071.htm
Neural Network Instruction Set Extension and Code Mapping Mechanism

LOU Wen-Qi, WANG Chao, GONG Lei, ZHOU Xue-Hai
(School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China)

Abstract: In recent years, due to the high-accuracy performance of Convolutional Neural Network (CNN) in character recognition and
image classification, it has received widespread attention in the field of machine learning. Nevertheless, the compute-intensive and
memory-intensive characteristics of CNN have posed huge challenges to the general-purpose processor, which needs to support various
workloads. Therefore, a large number of CNN-specific hardware accelerators have emerged to improve efficiency. Whereas, although
previous accelerators are significantly efficient, they usually lack flexibility. In this study, classical CNN models are analyzed and a
domain-specific instruction set of 10 matrix instructions, called RV-CNN, is design based on the promising RISC-V architecture. By
abstracting CNN computation into instructions, the proposed design can provide sufficient flexibility for CNN and possesses a higher
code density than the general ISA. Based on this, a code-to-instruction mapping mechanism is proposed. By using the RV-CNN to build
different CNN models on the Xilinx ZC702, it was found that compared to x86 processors, RV-CNN has an average of 141 times energy
efficiency and 8.91 times the code density; compared to GPU, it has an average of 1.25 times energy efficiency and 1.95 times the code
density. Besides, compared to previous CNN accelerators, the design supports typical CNN models while having good energy efficiency.

∗ 基金项目: 国家重点研发计划(2017YFA0700900, 2017YFA0700903); 国家自然科学基金(61379040); 江苏省自然科学基金
(BK20181193); 中国科学院青年创新促进会资助项目(2017497)
Foundation item: National Key Research and Development Program of China (2017YFA0700900, 2017YFA0700903); National
Natural Science Foundation of China (61379040); Natural Science Foundation of Jiangsu Province, China (BK20181193); Youth
Innovation Promotion Association CAS (2017497)
本文由“系统软件前沿进展”专题特约编辑武延军研究员、陈海波教授、包云岗研究员、李玲研究员推荐.
收稿时间: 2020-02-16; 修改时间: 2020-04-04; 采用时间: 2020-05-09; jos 在线出版时间: 2020-06-10

93 94 95 96 97 98 99 100 101 102 103