Page 55 - 《软件学报》2020年第9期

P. 55

2676 Journal of Software 软件学报 Vol.31, No.9, September 2020

[90] Punniyamurthy K, Boroujerdian B, Gerstlauer A. GATSim: Abstract timing simulation of GPUS. In: Proc. of the Design,
Automation & Test in Europe Conf. (DATE). IEEE, 2017. 43−48.
[91] GPGPU-Sim. http://www.gpgpu-sim.org/
[92] Bakhoda A, Yuan GL, Fung WWL, Wong H, Aamodt TM. Analyzing CUDA workloads using a detailed gpu simulator. In: Proc. of
the IEEE Int’l Symp. on Performance Analysis of Systems and Software (ISPASS). IEEE, 2009. 163−174.
[93] Wang X, Zhang W. Cache locking vs. partitioning for real-time computing on integrated CPU-GPU processors. In: Proc. of the
35th IEEE Int’l Performance Computing and Communications Conf. (IPCCC). IEEE, 2016. 1−8.
[94] Picchi J, Zhang W. Impact of l2 cache locking on GPU performance. In: Proc. of the SoutheastCon 2015. IEEE, 2015. 1−4.
[95] Huangfu Y, Zhang W. Warp-Based load/store reordering to improve GPU data cache time predictability and performance. In: Proc.
of the 19th IEEE Int’l Symp. on Real-Time Distributed Computing (ISORC). IEEE, 2016. 166−173.
[96] Huangfu Y, Zhang W. Warp-Based load/store reordering to improve gpu time predictability. JCSE, 2017,11(2).
[97] Chen G, Guan N, Lü MS, Wang Y. State-of-the-Art survey of real-time multicore system. Ruan Jian Xue Bao/ Journal of Software,
2018,29(7):2152−2176 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5580.htm [doi: 10.13328/j.cnki.jos.
005580]
[98] Kato S, Lakshmanan K, Rajkumar R, Ishikawa Y. TimeGraph: GPU scheduling for real-time multi-tasking environments. In: Proc.
of the 2011 USENIX Conf. on USENIX Annual Technical Conf. ACM, 2011. 2.
[99] Kato S, Lakshmanan K, Kumar A, Kelkar M, Ishikawa Y, Rajkumar R. RGEM: A responsive GPGPU execution model for runtime
engines. In: Proc. of the 32nd Real-Time Systems Symp. (RTSS). IEEE, 2011. 57−66.
[100] Basaran C, Kang KD. Supporting preemptive task executions and memory copies in GPGPUS. In: Proc. of the 24th Euromicro
Conf. on Real-Time Systems (ECRTS). IEEE, 2012. 287−296.
[101] Zhong J, He B. Kernelet: High-throughput gpu kernel executions with dynamic slicing and scheduling. IEEE Trans. on Parallel
Distrib. Syst., 2014,25(6):1522−1532.
[102] Verner U, Schuster A, Silberstein M, Mendelson A. Scheduling processing of real-time data streams on heterogeneous multi-GPU
systems. In: Proc. of the 5th Annual Int’l Systems and Storage Conf. (SYSTOR). ACM, 2012.
[103] Verner U, Mendelson A, Schuster A. Batch method for efficient resource sharing in real-time multi-GPU systems. In: Proc. of the
15th Int’l Conf. on Distributed Computing and Networking (ICDCN). Springer-Verlag, 2014. 347−362.
[104] Verner U, Mendelson A, Schuster A. Scheduling periodic real-time communication in multi-GPU systems. In: Proc. of the 23rd
Int’l Conf. on Computer Communication and Networks (ICCCN). IEEE, 2014. 1−8.
[105] Kim J, Andersson B, De Niz D, Rajkumar R. Segment-Fixed priority scheduling for self-suspending real-time tasks. In: Proc. of the
34th Real-Time Systems Symp. (RTSS). IEEE, 2013. 246−257.
[106] Chen G, Zhao Y, Shen X, Zhou H. EffiSha: A software framework for enabling effficient preemptive scheduling of GPU. In: Proc.
of the PPoPP. 2017. 3−16.
[107] Wang J, Rubin N, Sidelnik A, Yalamanchili S. Dynamic thread block launch: A lightweight execution mechanism to support
irregular applications on GPUS. ACM SIGARCH Computer Architecture News, 2015,43(3):528−540.
[108] Hosseinimotlagh S, Kim H. Thermal-Aware servers for real-time tasks on multi-core GPU-integrated embedded systems. In: Proc.
of the 25th IEEE Real-Time and Embedded Technology and Applications Symp. (RTAS). IEEE, 2019. 254−266.
[109] Nugteren C, Van den Braak GJ, Corporaal H, Bal HE. A detailed GPU cache model based on reuse distance theory. In: Proc. of the
20th Int’l Symp. on High Performance Computer Architecture (HPCA). IEEE, 2014. 37−48.
[110] Liang Y, Li X. Efficient kernel management on GPUS. ACM Transactions on Computer Systems, 2017,16(4):115:1−115:24.
[111] Park JJK, Park Y, Mahlke S. Dynamic resource management for efficient utilization of multitasking GPUS. ACM SIGARCH
Computer Architecture News, 2017,45(1):527−540.
[112] Elliott GA, Ward BC, Anderson JH. GPUSync: A framework for real-time GPU management. In: Proc. of the Real-Time Systems
Symp. 2013. 33−44.
[113] Pellizzoni R, Betti E, Bak S, Yao G, Criswell J, Caccamo M, Kegley R. A predictable execution model for cots-based embedded
systems. In: Proc. of the 17th Real-Time and Embedded Technology and Applications Symp. (RTAS). IEEE, 2011. 269−279.
[114] Alhammad A, Pellizzoni R. Time-Predictable execution of multithreaded applications on multicore systems. In: Proc. of the Design,
Automation & Test in Europe Conf. (DATE). European Design and Automation Association, 2014. 1−6.
[115] Abdelouahab K, Pelcat M, Serot J, Berry F. Accelerating CNN inference on fpgas: A survey. 2018.

50 51 52 53 54 55 56 57 58 59 60