Page 343 - 《软件学报》2020年第9期
P. 343

2964                                 Journal of Software  软件学报 Vol.31, No.9,  September 2020

         [24]    Chen X. Escoin: Efficient sparse convolutional neural network inference on GPUs. CoRR, abs/1802.10280, 2018.
         [25]    Mao H, Han S, Pool J, Li W, Liu X, Wang Y, Dally WJ. Exploring the granularity of sparsity in convolutional neural networks. In:
             Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition Workshops (CVPR Workshops). Honolulu: IEEE Computer
             Society, 2017. 1927−1934.
         [26]    Park J, Li SR, Wen W, Tang PTP, Li H, Chen Y, Dubey P. Faster cnns with direct sparse convolutions and guided pruning. In:
             Bengio Y, LeCun Y, eds. Proc. of the Int’l Conf. on Learning Representations (ICLR). Toulon, 2017.
         [27]    Lei J,  Gao  X, Song J, Wang XL, Song ML. Survey of deep neural network  model  compression.  Ruan Jian  Xue  Bao/Journal of
             Software, 2018,29(2):251−266  (in Chinese with  English abstract).  http://www.jos.org.cn/1000-9825/29/251.htm [doi:  10.13328/
             j.cnki.jos.005428]
         [28]    Zhang X, Tan G, Xue  S, Li  J, Zhou  K, Chen  M. Understanding  the  GPU microarchitecture to achieve  bare-metal  performance
             tuning. In: Sarkar V,  Rauchwerger  L,  eds. Proc. of  the 22nd  ACM SIGPLAN Symp. on Principles  and Practice of Parallel
             Programming (PPoPP). Austin: ACM, 2017. 31−43.
         [29]    Williams S, Waterman A,  Patterson DA. Roofline: An  insightful visual performance model for multicore architectures.
             Communications of the ACM, 2009,52(4):65−76.
         [30]    Liu  B,  Wang  M,  Foroosh  H,  Tappen MF, Pensky M. Sparse  convolutional neural networks. In: Proc. of the IEEE  Conf. on
             Computer Vision and Pattern Recognition (CVPR). Boston: IEEE Computer Society, 2015. 806−814.
         [31]    Wang Q, Zhang X, Zhang Y, Yi Q. AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs.
             In: Gropp W, Matsuoka S, eds. Proc. of the Int’l Conf. for High Performance Computing, Networking, Storage and Analysis (SC).
             Denver: ACM, 2013. 25:1−25:12.
         [32]    Daultani V, Ohno Y, Ishizaka K. Sparse direct convolutional neural network. In: Cong F, Leung ACS, Wei Q, eds. Proc. of the Int’l
             Symp. on Neural Networks. Sapporo, Hakodate, and Muroran. Hokkaido: Springer-Verlag, 2017. 10261: 293−303.
         [33]    Gray S, Radford A, Kingma DP. GPU kernels for block-sparse weights. CoRR, abs/1711.09224, 2017.
         [34]    Yao Z, Cao S, Xiao W, Zhang C, Nie L. Balanced sparsity for efficient dnn inference on GPU. In: Proc. of the AAAI Conf. on
             Artificial Intelligence (AAAI). Honolulu: AAAI Press, 2019. 5676−5683.
         [35]    Han S,  Liu  X,  Mao H, Pu J, Pedram A,  Horowitz  MA, Dally WJ.  EIE:  Efficient inference engine on  compressed deep neural
             network. In: Proc. of the ACM/IEEE Annual Int’l Symp. on Computer Architecture (ISCA). Seoul: IEEE Computer Society, 2016.
             243−254.
         [36]    Parashar A, Rhu M, Mukkara A, Puglielli A, Venkatesan R, Khailany B, Emer JS, Keckler SW, Dally WJ. SCNN: An accelerator
             for  compressed-sparse  convolutional neural networks.  In: Proc. of  the  Annual Int’l Symp. on  Computer  Architecture (ISCA).
             Toronto: ACM, 2017. 27−40.
         [37]    Park H, Kim D, Ahn J, Yoo S. Zero and data reuse-aware fast convolution for deep neural networks on GPU. In: Proc. of the 11th
             IEEE/ACM/IFIP Int’l Conf. on Hardware/Software Codesign and System Synthesis (CODES). Pittsburgh: ACM, 2016. 33:1−33:10.
         附中文参考文献:
         [27]  雷杰,高鑫,宋杰,王兴路,宋明黎.深度网络模型压缩综述.软件学报,2018,29(2):251−266. http://www.jos.org.cn/1000-9825/
             5428.htm [doi: 10.13328/ j.cnki.jos.005428]


                       董晓(1992-),男,博士,CCF 学生会员,主                    李晶(1991-),女,博士,主要研究领域为异
                       要研究领域为深度学习编译优化技术,稀                           构计算,GPU 程序优化.
                       疏计算,GPU 性能优化.



                       刘雷(1980-),男,博士,工程师,CCF 专业                    冯晓兵(1969-),男,博士,研究员,博士生
                       会员,主要研究领域为自动并行化等编译                           导师,CCF 杰出会员,主要研究领域为编译
                       优化技术,智能计算机编程方法,物联网编                          与编程技术.
                       程与人工智能编程.
   338   339   340   341   342   343   344   345   346   347   348