Page 343 - 《软件学报》2020年第9期

P. 343

2964 Journal of Software 软件学报 Vol.31, No.9, September 2020

[24] Chen X. Escoin: Efficient sparse convolutional neural network inference on GPUs. CoRR, abs/1802.10280, 2018.
[25] Mao H, Han S, Pool J, Li W, Liu X, Wang Y, Dally WJ. Exploring the granularity of sparsity in convolutional neural networks. In:
Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition Workshops (CVPR Workshops). Honolulu: IEEE Computer
Society, 2017. 1927−1934.
[26] Park J, Li SR, Wen W, Tang PTP, Li H, Chen Y, Dubey P. Faster cnns with direct sparse convolutions and guided pruning. In:
Bengio Y, LeCun Y, eds. Proc. of the Int’l Conf. on Learning Representations (ICLR). Toulon, 2017.
[27] Lei J, Gao X, Song J, Wang XL, Song ML. Survey of deep neural network model compression. Ruan Jian Xue Bao/Journal of
Software, 2018,29(2):251−266 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/29/251.htm [doi: 10.13328/
j.cnki.jos.005428]
[28] Zhang X, Tan G, Xue S, Li J, Zhou K, Chen M. Understanding the GPU microarchitecture to achieve bare-metal performance
tuning. In: Sarkar V, Rauchwerger L, eds. Proc. of the 22nd ACM SIGPLAN Symp. on Principles and Practice of Parallel
Programming (PPoPP). Austin: ACM, 2017. 31−43.
[29] Williams S, Waterman A, Patterson DA. Roofline: An insightful visual performance model for multicore architectures.
Communications of the ACM, 2009,52(4):65−76.
[30] Liu B, Wang M, Foroosh H, Tappen MF, Pensky M. Sparse convolutional neural networks. In: Proc. of the IEEE Conf. on
Computer Vision and Pattern Recognition (CVPR). Boston: IEEE Computer Society, 2015. 806−814.
[31] Wang Q, Zhang X, Zhang Y, Yi Q. AUGEM: Automatically generate high performance dense linear algebra kernels on x86 CPUs.
In: Gropp W, Matsuoka S, eds. Proc. of the Int’l Conf. for High Performance Computing, Networking, Storage and Analysis (SC).
Denver: ACM, 2013. 25:1−25:12.
[32] Daultani V, Ohno Y, Ishizaka K. Sparse direct convolutional neural network. In: Cong F, Leung ACS, Wei Q, eds. Proc. of the Int’l
Symp. on Neural Networks. Sapporo, Hakodate, and Muroran. Hokkaido: Springer-Verlag, 2017. 10261: 293−303.
[33] Gray S, Radford A, Kingma DP. GPU kernels for block-sparse weights. CoRR, abs/1711.09224, 2017.
[34] Yao Z, Cao S, Xiao W, Zhang C, Nie L. Balanced sparsity for efficient dnn inference on GPU. In: Proc. of the AAAI Conf. on
Artificial Intelligence (AAAI). Honolulu: AAAI Press, 2019. 5676−5683.
[35] Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ. EIE: Efficient inference engine on compressed deep neural
network. In: Proc. of the ACM/IEEE Annual Int’l Symp. on Computer Architecture (ISCA). Seoul: IEEE Computer Society, 2016.
243−254.
[36] Parashar A, Rhu M, Mukkara A, Puglielli A, Venkatesan R, Khailany B, Emer JS, Keckler SW, Dally WJ. SCNN: An accelerator
for compressed-sparse convolutional neural networks. In: Proc. of the Annual Int’l Symp. on Computer Architecture (ISCA).
Toronto: ACM, 2017. 27−40.
[37] Park H, Kim D, Ahn J, Yoo S. Zero and data reuse-aware fast convolution for deep neural networks on GPU. In: Proc. of the 11th
IEEE/ACM/IFIP Int’l Conf. on Hardware/Software Codesign and System Synthesis (CODES). Pittsburgh: ACM, 2016. 33:1−33:10.
附中文参考文献:
[27] 雷杰,高鑫,宋杰,王兴路,宋明黎.深度网络模型压缩综述.软件学报,2018,29(2):251−266. http://www.jos.org.cn/1000-9825/
5428.htm [doi: 10.13328/ j.cnki.jos.005428]

董晓(1992－),男,博士,CCF 学生会员,主李晶(1991－),女,博士,主要研究领域为异
要研究领域为深度学习编译优化技术,稀构计算,GPU 程序优化.
疏计算,GPU 性能优化.

刘雷(1980－),男,博士,工程师,CCF 专业冯晓兵(1969－),男,博士,研究员,博士生
会员,主要研究领域为自动并行化等编译导师,CCF 杰出会员,主要研究领域为编译
优化技术,智能计算机编程方法,物联网编与编程技术.
程与人工智能编程.

338 339 340 341 342 343 344 345 346 347 348