Page 109 - 《软件学报》2020年第10期
P. 109

娄文启  等:一种神经网络指令集扩展与代码映射机制                                                        3085


         6    结束语

             CNN 在图像识别、目标检测领域的广泛应用使其性能至关重要.本文通过分析典型 CNN 的计算模式,提出
         了一种高效且易于实现的专用指令集 RV-CNN,其包含了 10 条粗粒度的矩阵指令,可以灵活地为典型 CNN 模型
         推理过程提供支持.在此基础上,我们介绍了 CNN 模型描述文件到 RV-CNN 指令的映射过程.随后通过定性分
         析,从不同方面将 RV-CNN 与典型专用指令集进行比较.在指令实现方面,我们将该指令集扩展进了基于开源架
         构 RISC-V 的处理器核,并以相对紧耦合的方式将对应的矩阵单元嵌入经典的 5 级流水线中.最后,本设计在
         Xilinx ZC702 平台对上进行综合实现,并以典型的神经网络进行测试.结果显示,相比于 Intel i7-4790K 处理器和
         Tesla k40c GPU,该原型系统具有最高的能效和代码密度.此外,与先前的加速器相比,该原型系统在保持灵活性
         的同时也展现了不错的能效.
             目前新型 CNN 网络层出不穷,还需考虑对其中诸如深度可分离卷积等操作进行指令优化以提高执行效率.
         此外,扩展指令的设计与实现应针对 RISC-V 特点做出协同优化.最后,根据模型及硬件信息执行的代码映射过
         程还未自动化,未来,计划在以上方面加以改进.

         References:
          [1]    Wu  F, Kong Y, Dong  W,  et al. Gradient-aware  blind  face inpainting  for  deep face verification. Neurocomputing,  2019,
             331(FEB.28):301–311.
          [2]    Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection. In: Proc. of the IEEE Conf. on
             Computer Vision and Pattern Recognition. IEEE, 2016. 779–788. [doi: 10.1109/CVPR.2016.91]
          [3]    Sainath TN, Mohamed A, Kingsbury B, et al. Deep convolutional neural networks for LVCSR. In: Proc. of the IEEE Int’l Conf. on
             Acoustics, Speech, and Signal Processing. IEEE, 2013. 8614–8618. [doi: 10.1109/ICASSP.2013.6639347]
          [4]    Collobert R, Weston J, Bottou L, et al. Natural language processing (Almost) from scratch. Journal of Machine Learning Research,
             2011,12(1):2493–2537. [doi: 10.1016/j.chemolab.2011.03.009]
          [5]    Krizhevsky A,  Sutskever  I, Hinton G.  ImageNet classification with  deep convolutional  neural  networks. Advances  in Neural
             Information Processing Systems, 2012,25(2):1097–1105. [doi: 10.1145/3065386]
          [6]    Szegedy C, Liu W, Jia Y, et al.  Going deeper  with  convolutions. In: Proc. of the IEEE  Conf. on  Computer  Vision  and Pattern
             Recognition. IEEE, 2014. 1–9. [doi: 10.1109/CVPR.2015.7298594]
          [7]    He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proc. of the IEEE Conf. on Computer Vision and
             Pattern Recognition. IEEE, 2016. 770–778. [doi: 10.1109/CVPR.2016.90]
          [8]    Gong L, Wang C, Li X,  et al. MALOC: A  fully pipelined  FPGA accelerator for convolutional  neural  networks with all  layers
             mapped on chip. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 2018,37(11):2601–2612. [doi: 10.
             1109/TCAD.2018.2857078]
          [9]    Wang C, Gong L, Yu Q, et al. DLAU: A scalable deep learning accelerator unit on FPGA. IEEE Trans. on Computer-Aided Design
             of Integrated Circuits and Systems, 2017,36(3):513–517. [doi: 10.1109/TCAD.2016.2587683]
         [10]    Wang  C, Li  X, Chen Y,  et al. Service-oriented  architecture on FPGA-based MPSoC. IEEE Trans. on Parallel  and Distributed
             Systems, 2017,28(10):2993–3006. [doi: 10.1109/TPDS.2017.2701828]
         [11]    Chen T, Du Z, Sun N, et al. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proc. of
             the Architectural Support for  Programming Languages and Operating  Systems. ACM,  2014.  269–284.  [doi:  10.1145/2541940.
             2541967]
         [12]    Moons B,  Verhelst  M.  An  energy-efficient precision-scalable  ConvNet processor in 40-nm  CMOS. IEEE Journal of Solid-state
             Circuits, 2017,52(4):903–914. [doi: 10.1109/JSSC.2016.2636225]
         [13]    Chen Y, Luo T, Liu S, et al. DaDianNao: A machine-learning supercomputer. In: Proc. of the Int’l Symp. on Microarchitecture.
             IEEE, 2014. 609–622. [doi: 10.1109/MICRO.2014.58]
         [14]    Liu S, Du Z, Tao J, et al. Cambricon: An instruction set architecture for neural networks. In: Proc. of the 43rd ACM/IEEE Annual
             Int’l Symp. on Computer Architecture (ISCA). IEEE, 2016. 393–405. [doi: 10.1145/3007787.3001179]
   104   105   106   107   108   109   110   111   112   113   114