Page 54 - 《软件学报》2020年第9期

P. 54

张政馗等:面向实时应用的深度学习研究综述 2675

[65] Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K. SqueezeNet: Alexnet-level accuracy with 50x fewer
parameters and <0.5MB model size. 2016. 1−13.
[66] Wang Y, Li H, Li X. Real-Time meets approximate computing: An elastic CNN inference accelerator with adaptive trade-off
between qos and qor. In: Proc. of the 54th Annual Design Automation Conf. 2017. 2017. 33:1−33:6.
[67] Sainath TN, Kingsbury B, Sindhwani V, Arisoy E, Ramabhadran B. Low-Rank matrix factorization for deep neural network
training with high-dimensional output targets. In: Proc. of the IEEE Int’l Conf. on Acoustics, Speech and Signal Processing
(ICASSP). IEEE, 2013. 6655−6659.
[68] Lebedev V, Ganin Y, Rakhuba M, Oseledets I, Lempitsky V. Speeding-Up convolutional neural networks using fine-tuned cp-
decomposition. 2014. 1−11.
[69] Jaderberg M, Vedaldi A, Zisserman A. Speeding up convolutional neural networks with low rank expansions. 2014.
[70] Zhang X, Zou J, Ming X, He K, Sun J. Efficient and accurate approximations of nonlinear convolutional networks. In: Proc. of the
IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. 2015. 1984−1992.
[71] Zhang X, Zou J, He K, Sun J. Accelerating very deep convolutional networks for classification and detection. IEEE Trans. on
Pattern Analysis and Machine Intelligence, 2016,38(10):1943−1955.
[72] Denil M, Shakibi B, Dinh L, Ranzato M, De Freitas N. Predicting parameters in deep learning. 2013. 1−9.
[73] Han S, Mao H, Dally WJ. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman
coding. 2015. 1−14.
[74] Han S, Pool J, Tran J, Dally WJ. Learning both weights and connections for efficient neural networks. 2015. 1−9.
[75] Courbariaux M, Hubara I, Soudry D, El-Yaniv R, Bengio Y. Binarized neural networks: Training deep neural networks with
weights and activations constrained to +1 or −1. 2016.
[76] Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H. MobileNets: Efficient convolutional
neural networks for mobile vision applications. 2017.
[77] Chollet F. Xception: Deep learning with depthwise separable convolutions. In: Proc. of the 30th IEEE Conf. on Computer Vision
and Pattern Recognition (CVPR). 2017. 1800−1807.
[78] Zhang X, Zhou X, Lin M, Sun J. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: Proc. of
the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. 2018. 6848−6856.
[79] Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC. MobileNetV2: Inverted residuals and linear bottlenecks. In: Proc. of the
IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. 2018. 4510−4520.
[80] Ma N, Zhang X, Zheng HT, Sun J. Shufflenet v2: Practical guidelines for efficient CNN architecture design. In: Proc. of the 15th
European Conf. on Computer Vision (ECCV). LNCS 11218. Springer-Verlag, 2018. 122−138.
[81] Gholami A, Kwon K, Wu B, Tai Z, Yue X, Jin P, Zhao S, Keutzer K, Berkeley UC. SqueezeNext: Hardware-aware neural network
design. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition Workshops (CVPR). IEEE Computer Society,
2018.
[82] Yao Q, Wang M, Chen Y, Dai W, Yi QH, Yu FL, Wei WT, Qiang Y, Yang Y. Taking human out of learning applications: A survey
on automated machine learning. 2018. 1−26.
[83] Zoph B, Le Q V. Neural architecture search with reinforcement learning. 2016. 1−16.
[84] Tan M, Chen B, Pang R, Vasudevan V, Sandler M, Howard A, Le QV. MnasNet: Platform-aware neural architecture search for
mobile. 2018.
[85] Xu H, Mueller F. Hardware for machine learning: Challenges and opportunities. In: Proc. of the Real-Time Systems Symp. IEEE,
2019. 157−160.
[86] Berezovskyi K, Bletsas K, Andersson B. Makespan computation for GPU threads running on a single streaming multiprocessor. In:
Proc. of the 24th Euromicro Conf. on Real-Time Systems (ECRTS). IEEE, 2012. 277−286.
[87] Berezovskyi K, Bletsas K, Petters SM. Faster makespan estimation for GPU threads on a single streaming multiprocessor. In: Proc.
of the 2013 IEEE 18th Conf. on Emerging Technologies & Factory Automation (ETFA). IEEE, 2013. 1−8.
[88] Berezovskyi K, Santinelli L, Bletsas K, Tovar E. WCET measurement-based and extreme value theory characterisation of CUDA
kernels. In: Proc. of the 22nd Int’l Conf. on Real-Time Networks and Systems (RTNS). ACM, 2014. 279.
[89] Betts A, Donaldson A. Estimating the wcet of GPU-accelerated applications using hybrid analysis. In: Proc. of the Euromicro Conf.
on Real-Time Systems. IEEE, 2013. 193−202.

49 50 51 52 53 54 55 56 57 58 59