Page 54 - 《软件学报》2020年第9期
P. 54

张政馗  等:面向实时应用的深度学习研究综述                                                           2675


         [65]     Iandola  FN, Han S, Moskewicz MW,  Ashraf K, Dally  WJ,  Keutzer K.  SqueezeNet: Alexnet-level accuracy  with  50x  fewer
             parameters and <0.5MB model size. 2016. 1−13.
         [66]     Wang Y, Li H, Li  X. Real-Time meets approximate computing: An elastic CNN  inference  accelerator  with adaptive  trade-off
             between qos and qor. In: Proc. of the 54th Annual Design Automation Conf. 2017. 2017. 33:1−33:6.
         [67]     Sainath TN, Kingsbury B,  Sindhwani V, Arisoy E, Ramabhadran B.  Low-Rank matrix  factorization  for  deep  neural network
             training  with high-dimensional output targets. In: Proc. of the  IEEE Int’l  Conf. on Acoustics, Speech  and  Signal Processing
             (ICASSP). IEEE, 2013. 6655−6659.
         [68]     Lebedev V, Ganin Y, Rakhuba M,  Oseledets I, Lempitsky V. Speeding-Up convolutional neural networks  using  fine-tuned cp-
             decomposition. 2014. 1−11.
         [69]     Jaderberg M, Vedaldi A, Zisserman A. Speeding up convolutional neural networks with low rank expansions. 2014.
         [70]     Zhang X, Zou J, Ming X, He K, Sun J. Efficient and accurate approximations of nonlinear convolutional networks. In: Proc. of the
             IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. 2015. 1984−1992.
         [71]     Zhang  X, Zou  J, He K,  Sun J. Accelerating  very  deep convolutional  networks  for classification and  detection. IEEE Trans.  on
             Pattern Analysis and Machine Intelligence, 2016,38(10):1943−1955.
         [72]     Denil M, Shakibi B, Dinh L, Ranzato M, De Freitas N. Predicting parameters in deep learning. 2013. 1−9.
         [73]     Han S, Mao H, Dally WJ. Deep compression: compressing deep neural networks with pruning, trained quantization and huffman
             coding. 2015. 1−14.
         [74]     Han S, Pool J, Tran J, Dally WJ. Learning both weights and connections for efficient neural networks. 2015. 1−9.
         [75]     Courbariaux M, Hubara  I,  Soudry D, El-Yaniv R, Bengio Y. Binarized neural  networks: Training  deep neural  networks  with
             weights and activations constrained to +1 or −1. 2016.
         [76]     Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H. MobileNets: Efficient convolutional
             neural networks for mobile vision applications. 2017.
         [77]     Chollet F. Xception: Deep learning with depthwise separable convolutions. In: Proc. of the 30th IEEE Conf. on Computer Vision
             and Pattern Recognition (CVPR). 2017. 1800−1807.
         [78]     Zhang X, Zhou X, Lin M, Sun J. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: Proc. of
             the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. 2018. 6848−6856.
         [79]     Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC. MobileNetV2: Inverted residuals and linear bottlenecks. In: Proc. of the
             IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. 2018. 4510−4520.
         [80]     Ma N, Zhang X, Zheng HT, Sun J. Shufflenet v2: Practical guidelines for efficient CNN architecture design. In: Proc. of the 15th
             European Conf. on Computer Vision (ECCV). LNCS 11218. Springer-Verlag, 2018. 122−138.
         [81]     Gholami A, Kwon K, Wu B, Tai Z, Yue X, Jin P, Zhao S, Keutzer K, Berkeley UC. SqueezeNext: Hardware-aware neural network
             design. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition Workshops (CVPR). IEEE Computer Society,
             2018.
         [82]     Yao Q, Wang M, Chen Y, Dai W, Yi QH, Yu FL, Wei WT, Qiang Y, Yang Y. Taking human out of learning applications: A survey
             on automated machine learning. 2018. 1−26.
         [83]      Zoph B, Le Q V. Neural architecture search with reinforcement learning. 2016. 1−16.
         [84]     Tan M, Chen B, Pang R, Vasudevan V, Sandler M, Howard A, Le QV. MnasNet: Platform-aware neural architecture search for
             mobile. 2018.
         [85]     Xu H, Mueller F. Hardware for machine learning: Challenges and opportunities. In: Proc. of the Real-Time Systems Symp. IEEE,
             2019. 157−160.
         [86]     Berezovskyi K, Bletsas K, Andersson B. Makespan computation for GPU threads running on a single streaming multiprocessor. In:
             Proc. of the 24th Euromicro Conf. on Real-Time Systems (ECRTS). IEEE, 2012. 277−286.
         [87]     Berezovskyi K, Bletsas K, Petters SM. Faster makespan estimation for GPU threads on a single streaming multiprocessor. In: Proc.
             of the 2013 IEEE 18th Conf. on Emerging Technologies & Factory Automation (ETFA). IEEE, 2013. 1−8.
         [88]     Berezovskyi K, Santinelli L, Bletsas K, Tovar E. WCET measurement-based and extreme value theory characterisation of CUDA
             kernels. In: Proc. of the 22nd Int’l Conf. on Real-Time Networks and Systems (RTNS). ACM, 2014. 279.
         [89]     Betts A, Donaldson A. Estimating the wcet of GPU-accelerated applications using hybrid analysis. In: Proc. of the Euromicro Conf.
             on Real-Time Systems. IEEE, 2013. 193−202.
   49   50   51   52   53   54   55   56   57   58   59