Page 53 - 《软件学报》2020年第9期
P. 53
2674 Journal of Software 软件学报 Vol.31, No.9, September 2020
[42] Kim D, Kung J, Chai SM, Yalamanchili S, Mukhopadhyay S. Neurocube: A programmable digital neuromorphic architecture with
high-density 3D memory. In: Proc. of the 43rd ACM/IEEE Annual Int’l Symp. on Computer Architecture (ISCA). IEEE Computer
Society, 2016. 380−392.
[43] Shafiee A, Nag A, Muralimanohar N, Balasubramonian R, Strachan JP, Hu M, Williams RS, Srikumar V. ISAAC: A convolutional
neural network accelerator with in-situ analog arithmetic in crossbars. In: Proc. of the 43rd ACM/IEEE Annual Int’l Symp. on
Computer Architecture (ISCA). IEEE Computer Society, 2016. 14−26.
[44] Xu H, Mueller F, Carolina N. Work-in-Progress: Making machine learning real-time predictable. In: Proc. of the 2018 IEEE Real-
Time Systems Symp. (RTSS). IEEE, 2018. 157−160.
[45] Kim H, Nam H, Jung W, Lee J. Performance analysis of CNN frameworks for GPUS. In: Proc. of the IEEE Int’l Symp. on
Performance Analysis of Systems and Software (ISPASS). IEEE, 2017. 55−64.
[46] Wang Y. Towards customizable CPS: Composability, efficiency and predictability. In: Duan Z, Ong L, eds. Proc. of the 19th Int’l
Conf. on Formal Engineering Methods (ICFEM). Vol.10610. Xi’an: Springer-Verlag, 2017. 3−15.
[47] Abdullah J, Dai G, Yi W. Worst-Case cause-effect reaction latency in systems with non-blocking communication. In: Proc. of the
2019 Design, Automation & Test in Europe Conf. & Exhibition (DATE). 2019. 1625−1630.
[48] Wilhelm R, Engblom J, Ermedahl A, Holsti N, Thesing S, Whalley DB, Bernat G, Ferdinand C, Heckmann R, Mitra T, Mueller F,
Puaut I, Puschner PP, Staschulat J, Stenström P. The worst-case execution-time problem—Overview of methods and survey of
tools. ACM Transactions on Computer Systems, 2008,7(3):36:1−36:53.
[49] Davis RI, Burns A. A survey of hard real-time scheduling for multiprocessor systems. ACM Computing Surveys, 2011,43(4):
35:1−35:44.
[50] Hestness J, Keckler SW, Wood DA. A comparative analysis of microarchitecture effects on CPU and gpu memory system behavior.
In: Proc. of the 2014 IEEE Int’l Symp. on Workload Characterization (IISWC). IEEE, 2014. 150−160.
[51] Posluszny D. Avoiding pitfalls when using nvidia GPUS for real-time tasks in autonomous systems. In: Proc. of the 30th Euromicro
Conf. on Real-Time Systems (ECRTS). IEEE, 2018. 1−21.
[52] Reineke J, Wilhelm R. Impact of resource sharing on performance and performance prediction. In: Proc. of the Design, Automation
& Test in Europe Conf. (DATE). European Design and Automation Association, 2014. 1−2.
[53] Capodieci N, Cavicchioli R, Bertogna M, Paramakuru A. Deadline-Based scheduling for GPU with preemption support. In: Proc. of
the 2018 IEEE Real-Time Systems Symp. (RTSS). IEEE, 2018. 119−130.
[54] Forsberg B, Marongiu A, Benini L. GPUguard: Towards supporting a predictable execution model for heterogeneous SoC. In: Proc.
of the 2017 Design, Automation and Test in Europe (DATE). 2017. 318−321.
[55] Bavoil L. SetStablePowerState.exe: Disabling GPU boost on windows 10 for more deterministic timestamp queries on nvidia GPUS.
2016. https://developer.nvidia.com
[56] Shams S, Platania R, Lee K, Park SJ. Evaluation of deep learning frameworks over different HPC architectures. In: Proc. of the
Int’l Conf. on Distributed Computing Systems. IEEE, 2017. 1389−1396.
[57] Mojumder SA, Louis MS, Sun Y, Ziabari AK, Abellán JL, Kim J, Kaeli D, Joshi A. Profiling DNN workloads on a volta-based
DGX-1 system. In: Proc. of the 2018 IEEE Int’l Symp. on Workload Characterization (IISWC). 2018. 122−133.
[58] Stephenson M, Sastry Hari SK, Lee Y, Ebrahimi E, Johnson DR, Nellans D, O’Connor M, Keckler SW. Flexible software profiling
of GPU architectures. ACM SIGARCH Computer Architecture News, 2015,43(3):185−197.
[59] Shen D, Song SL, Li A, Liu X. CUDAAdvisor: LLVM-based runtime profiling for modern GPUS. 2018. 214−227.
[60] Farooqui N, Kerr A, Eisenhauer G, Schwan K, Yalamanchili S. Lynx: A dynamic instrumentation system for data-parallel
applications on GPGPU architectures. In: Proc. of the IEEE Int’l Symp. on Performance Analysis of Systems and Software
(ISPASS). IEEE, 2012. 58−67.
[61] Qi H, Sparks ER, Talwalkar A. Paleo: A performance model for deep neural networks. In: Proc. of the ICLR. 2017. 1−10.
[62] Dong S, Gong X, Sun Y, Baruah T, Kaeli D. Characterizing the microarchitectural implications of a convolutional neural network
(CNN) execution on GPUS. 2018. 96−106.
[63] Madougou S, Varbanescu AL, De Laat C, Van Nieuwpoort R. A tool for bottleneck analysis and performance prediction for GPU-
accelerated applications. In: Proc. of the 2016 IEEE 30th Int’l Parallel and Distributed Processing Symp. (IPDPS). IEEE, 2016.
641−652.
[64] Ali W, Yun H. Protecting real-time GPU kernels on integrated CPU-GPU SoC platforms. In: Proc. of the 30th Euromicro Conf. on
Real-Time Systems (ECRTS). Vol.106. Schloss Dagstuhl—Leibniz-Zentrum fuer Informatik, 2018. 19:1−19:22.