Page 53 - 《软件学报》2020年第9期
P. 53

2674                                 Journal of Software  软件学报 Vol.31, No.9,  September 2020

         [42]     Kim D, Kung J, Chai SM, Yalamanchili S, Mukhopadhyay S. Neurocube: A programmable digital neuromorphic architecture with
             high-density 3D memory. In: Proc. of the 43rd ACM/IEEE Annual Int’l Symp. on Computer Architecture (ISCA). IEEE Computer
             Society, 2016. 380−392.
         [43]     Shafiee A, Nag A, Muralimanohar N, Balasubramonian R, Strachan JP, Hu M, Williams RS, Srikumar V. ISAAC: A convolutional
             neural  network accelerator with  in-situ analog arithmetic in crossbars. In:  Proc.  of the  43rd ACM/IEEE Annual  Int’l  Symp.  on
             Computer Architecture (ISCA). IEEE Computer Society, 2016. 14−26.
         [44]     Xu H, Mueller F, Carolina N. Work-in-Progress: Making machine learning real-time predictable. In: Proc. of the 2018 IEEE Real-
             Time Systems Symp. (RTSS). IEEE, 2018. 157−160.
         [45]     Kim  H,  Nam  H, Jung W,  Lee J. Performance  analysis of  CNN frameworks for  GPUS.  In: Proc. of the IEEE Int’l Symp. on
             Performance Analysis of Systems and Software (ISPASS). IEEE, 2017. 55−64.
         [46]     Wang Y. Towards customizable CPS: Composability, efficiency and predictability. In: Duan Z, Ong L, eds. Proc. of the 19th Int’l
             Conf. on Formal Engineering Methods (ICFEM). Vol.10610. Xi’an: Springer-Verlag, 2017. 3−15.
         [47]     Abdullah J, Dai G, Yi W. Worst-Case cause-effect reaction latency in systems with non-blocking communication. In: Proc. of the
             2019 Design, Automation & Test in Europe Conf. & Exhibition (DATE). 2019. 1625−1630.
         [48]     Wilhelm R, Engblom J, Ermedahl A, Holsti N, Thesing S, Whalley DB, Bernat G, Ferdinand C, Heckmann R, Mitra T, Mueller F,
             Puaut I, Puschner  PP, Staschulat J,  Stenström P.  The  worst-case  execution-time problem—Overview of  methods  and survey of
             tools. ACM Transactions on Computer Systems, 2008,7(3):36:1−36:53.
         [49]     Davis RI, Burns  A. A survey  of hard  real-time scheduling  for multiprocessor systems. ACM Computing  Surveys,  2011,43(4):
             35:1−35:44.
         [50]     Hestness J, Keckler SW, Wood DA. A comparative analysis of microarchitecture effects on CPU and gpu memory system behavior.
             In: Proc. of the 2014 IEEE Int’l Symp. on Workload Characterization (IISWC). IEEE, 2014. 150−160.
         [51]     Posluszny D. Avoiding pitfalls when using nvidia GPUS for real-time tasks in autonomous systems. In: Proc. of the 30th Euromicro
             Conf. on Real-Time Systems (ECRTS). IEEE, 2018. 1−21.
         [52]     Reineke J, Wilhelm R. Impact of resource sharing on performance and performance prediction. In: Proc. of the Design, Automation
             & Test in Europe Conf. (DATE). European Design and Automation Association, 2014. 1−2.
         [53]     Capodieci N, Cavicchioli R, Bertogna M, Paramakuru A. Deadline-Based scheduling for GPU with preemption support. In: Proc. of
             the 2018 IEEE Real-Time Systems Symp. (RTSS). IEEE, 2018. 119−130.
         [54]     Forsberg B, Marongiu A, Benini L. GPUguard: Towards supporting a predictable execution model for heterogeneous SoC. In: Proc.
             of the 2017 Design, Automation and Test in Europe (DATE). 2017. 318−321.
         [55]    Bavoil L. SetStablePowerState.exe: Disabling GPU boost on windows 10 for more deterministic timestamp queries on nvidia GPUS.
             2016. https://developer.nvidia.com
         [56]     Shams S, Platania R, Lee K, Park SJ. Evaluation of deep learning frameworks over different HPC architectures. In: Proc. of the
             Int’l Conf. on Distributed Computing Systems. IEEE, 2017. 1389−1396.
         [57]     Mojumder SA, Louis MS, Sun Y, Ziabari AK, Abellán JL, Kim J, Kaeli D, Joshi A. Profiling DNN workloads on a volta-based
             DGX-1 system. In: Proc. of the 2018 IEEE Int’l Symp. on Workload Characterization (IISWC). 2018. 122−133.
         [58]     Stephenson M, Sastry Hari SK, Lee Y, Ebrahimi E, Johnson DR, Nellans D, O’Connor M, Keckler SW. Flexible software profiling
             of GPU architectures. ACM SIGARCH Computer Architecture News, 2015,43(3):185−197.
         [59]     Shen D, Song SL, Li A, Liu X. CUDAAdvisor: LLVM-based runtime profiling for modern GPUS. 2018. 214−227.
         [60]     Farooqui N, Kerr A,  Eisenhauer G,  Schwan K, Yalamanchili  S. Lynx: A  dynamic  instrumentation system  for  data-parallel
             applications on  GPGPU  architectures. In:  Proc. of the IEEE Int’l Symp. on Performance  Analysis  of Systems  and Software
             (ISPASS). IEEE, 2012. 58−67.
         [61]     Qi H, Sparks ER, Talwalkar A. Paleo: A performance model for deep neural networks. In: Proc. of the ICLR. 2017. 1−10.
         [62]     Dong S, Gong X, Sun Y, Baruah T, Kaeli D. Characterizing the microarchitectural implications of a convolutional neural network
             (CNN) execution on GPUS. 2018. 96−106.
         [63]     Madougou S, Varbanescu AL, De Laat C, Van Nieuwpoort R. A tool for bottleneck analysis and performance prediction for GPU-
             accelerated applications. In: Proc. of the 2016 IEEE 30th Int’l Parallel and Distributed Processing Symp. (IPDPS). IEEE, 2016.
             641−652.
         [64]     Ali W, Yun H. Protecting real-time GPU kernels on integrated CPU-GPU SoC platforms. In: Proc. of the 30th Euromicro Conf. on
             Real-Time Systems (ECRTS). Vol.106. Schloss Dagstuhl—Leibniz-Zentrum fuer Informatik, 2018. 19:1−19:22.
   48   49   50   51   52   53   54   55   56   57   58