Page 41 - 《软件学报》2025年第9期
P. 41
3952 软件学报 2025 年第 36 卷第 9 期
Programming Languages and Systems (TOPLAS), 1989, 11(1): 57–66. [doi: 10.1145/59287.59291]
[15] Auyeung A, Gondra I, Dai HK. Multi-heuristic list scheduling genetic algorithm for task scheduling. In: Proc. of the 2003 ACM Symp. on
Applied Computing. Melbourne: ACM, 2003. 721–724. [doi: 10.1145/952532.952673]
[16] Huang L, Feng XB. Survey on techniques of integrated instruction scheduling and register allocation. Application Research of
Computers, 2008, 25(4): 979–982 (in Chinese with English abstract). [doi: 10.3969/j.issn.1001-3695.2008.04.005]
[17] Deng C, Chen ZY, Shi Y, Ma YM, Wen M, Luo L. Optimizing VLIW instruction scheduling via a two-dimensional constrained dynamic
programming. ACM Trans. on Design Automation of Electronic Systems, 2024, 29(5): 83. [doi: 10.1145/3643135]
[18] Fisher JA. Trace scheduling: A technique for global microcode compaction. IEEE Trans. on Computers, 1981, C-30(7): 478–490. [doi: 10.
1109/TC.1981.1675827]
[19] Colwell RP, Nix RP, O'Donnell JJ, Papworth DB, Rodman PK. A VLIW architecture for a trace scheduling compiler. ACM SIGARCH
Computer Architecture News, 1987, 15(5): 180–192. [doi: 10.1145/36177.36201]
[20] Hwu WMW, Mahlke SA, Chen WY, Chang PP, Warter NJ, Bringmann RA, Ouellette RG, Hank RE, Kiyohara T, Haab GE, Holm JG,
Lavery DM. The superblock: An effective technique for VLIW and superscalar compilation. The Journal of Supercomputing, 1993, 7(1):
229–248.
[21] Mahlke SA, Lin DC, Chen WY, Hank RE, Bringmann RA. Effective compiler support for predicated execution using the hyperblock.
ACM SIGMICRO Newsletter, 1992, 23(1–2): 45–54. [doi: 10.1145/144965.144998]
[22] Giesemann F, Payá-Vayá G, Gerlach L, Blume H, Pflug F, von Voigt G. Using a genetic algorithm approach to reduce register file
pressure during instruction scheduling. In: Proc. of the 2017 Int’l Conf. on Embedded Computer Systems: Architectures, Modeling, and
Simulation (SAMOS). Pythagorion: IEEE, 2017. 179–187. [doi: 10.1109/SAMOS.2017.8344626]
[23] Giesemann F, Gerlach L, Payá-Vayá G. Evolutionary algorithms for instruction scheduling, operation merging, and register allocation in
VLIW compilers. Journal of Signal Processing Systems, 2020, 92(7): 655–678. [doi: 10.1007/s11265-019-01493-2]
[24] Stuckmann F, Payá-Vayá G. A graph neural network approach to improve list scheduling heuristics under register-pressure. In: Proc. of
the 13th Int’l Conf. on Modern Circuits and Systems Technologies (MOCAST). Sofia: IEEE, 2024. 1–6. [doi: 10.1109/MOCAST61810.
2024.10615463]
[25] Six C, Boulmé S, Monniaux D. Certified and efficient instruction scheduling: Application to interlocked VLIW processors. Proc. of the
ACM on Programming Languages, 2020, 4: 129. [doi: 10.1145/3428197]
[26] Six C, Gourdin L, Boulmé S, Monniaux D, Fasse J, Nardino N. Formally verified superblock scheduling. In: Proc. of the 11th ACM
SIGPLAN Int’l Conf. on Certified Programs and Proofs. Philadelphia: ACM, 2022. 40–54. [doi: 10.1145/3497775.3503679]
[27] Yang ZT, Shirako J, Sarkar V. Fully Verified Instruction Scheduling. Proc. of the ACM on Programming Languages, 2024,
8(OOPSLA2): 791–816. [doi: 10.1145/3689739]
[28] Herklotz Y, Wickerson J. Hyperblock scheduling for verified high-level synthesis. Proc. of the ACM on Programming Languages, 2024,
8(PLDI): 1929–1953. [doi: 10.1145/3656455]
[29] Zhou ZX, He H, Zhang YJ, Yang X, Sun YH. Two-dimensional force-directed cluster scheduling algorithm for the clustered VLIW
architecture. Journal of Tsinghua University (Science and Technology), 2008, 48(10): 1643–1646 (in Chinese with English abstract).
[30] Desoli G. Instruction assignment for clustered VLIW DSP compilers: A new approach. Palo Alto: Hewlett Packard Laboratories, 1998.
[31] Porpodas V, Cintra M. CAeSaR: Unified cluster-assignment scheduling and communication reuse for clustered VLIW processors. In:
Proc. of the 2013 Int’l Conf. on Compilers, Architecture and Synthesis for Embedded Systems (CASES). Montreal: IEEE, 2013. 1–10.
[doi: 10.1109/CASES.2013.6662513]
[32] Park JCH, Schlansker M. On predicated execution. Palo Alto: Hewlett-Packard Laboratories, 1991.
[33] Traber A, Zaruba F, Stucki S, Pullini A, Haugou G, Flamand E, Gürkaynak FK, Benini L. PULPino: A small single-core RISC-V SoC.
In: Proc. of the 3rd RISC-V Workshop. 2016.
[34] RISCV-Collab/RISCV-GNU-toolchain. 2024. https://github.com/riscv-collab/riscv-gnu-toolchain
[35] The LLVM Compiler Infrastructure. 2024. https://github.com/llvm/llvm-project
[36] Spike RISC-V ISA Simulator. 2024. https://github.com/riscv-software-src/riscv-isa-sim
[37] Bellard F. QEMU, a fast and portable dynamic translator. In: Proc. of the 2005 USENIX Annual Technical Conf. Anaheim: USENIX
Association, 2005. 41–46.
[38] Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K,
Shoaib M, Vaish N, Hill MD, Wood DA. The gem5 simulator. ACM SIGARCH Computer Architecture News, 2011, 39(2): 1–7. [doi: 10.
1145/2024716.2024718]
[39] CoreMark is an industry-standard benchmark that measures the performance of central processing units (CPU) and embedded

