Page 41 - 《软件学报》2025年第9期
P. 41

3952                                                       软件学报  2025  年第  36  卷第  9  期


                     Programming Languages and Systems (TOPLAS), 1989, 11(1): 57–66. [doi: 10.1145/59287.59291]
                 [15]   Auyeung A, Gondra I, Dai HK. Multi-heuristic list scheduling genetic algorithm for task scheduling. In: Proc. of the 2003 ACM Symp. on
                     Applied Computing. Melbourne: ACM, 2003. 721–724. [doi: 10.1145/952532.952673]
                 [16]   Huang  L,  Feng  XB.  Survey  on  techniques  of  integrated  instruction  scheduling  and  register  allocation.  Application  Research  of
                     Computers, 2008, 25(4): 979–982 (in Chinese with English abstract). [doi: 10.3969/j.issn.1001-3695.2008.04.005]
                 [17]   Deng C, Chen ZY, Shi Y, Ma YM, Wen M, Luo L. Optimizing VLIW instruction scheduling via a two-dimensional constrained dynamic
                     programming. ACM Trans. on Design Automation of Electronic Systems, 2024, 29(5): 83. [doi: 10.1145/3643135]
                 [18]   Fisher JA. Trace scheduling: A technique for global microcode compaction. IEEE Trans. on Computers, 1981, C-30(7): 478–490. [doi: 10.
                     1109/TC.1981.1675827]
                 [19]   Colwell RP, Nix RP, O'Donnell JJ, Papworth DB, Rodman PK. A VLIW architecture for a trace scheduling compiler. ACM SIGARCH
                     Computer Architecture News, 1987, 15(5): 180–192. [doi: 10.1145/36177.36201]
                 [20]   Hwu WMW, Mahlke SA, Chen WY, Chang PP, Warter NJ, Bringmann RA, Ouellette RG, Hank RE, Kiyohara T, Haab GE, Holm JG,
                     Lavery DM. The superblock: An effective technique for VLIW and superscalar compilation. The Journal of Supercomputing, 1993, 7(1):
                     229–248.
                 [21]   Mahlke SA, Lin DC, Chen WY, Hank RE, Bringmann RA. Effective compiler support for predicated execution using the hyperblock.
                     ACM SIGMICRO Newsletter, 1992, 23(1–2): 45–54. [doi: 10.1145/144965.144998]
                 [22]   Giesemann F, Payá-Vayá G, Gerlach L, Blume H, Pflug F, von Voigt G. Using a genetic algorithm approach to reduce register file
                     pressure during instruction scheduling. In: Proc. of the 2017 Int’l Conf. on Embedded Computer Systems: Architectures, Modeling, and
                     Simulation (SAMOS). Pythagorion: IEEE, 2017. 179–187. [doi: 10.1109/SAMOS.2017.8344626]
                 [23]   Giesemann F, Gerlach L, Payá-Vayá G. Evolutionary algorithms for instruction scheduling, operation merging, and register allocation in
                     VLIW compilers. Journal of Signal Processing Systems, 2020, 92(7): 655–678. [doi: 10.1007/s11265-019-01493-2]
                 [24]   Stuckmann F, Payá-Vayá G. A graph neural network approach to improve list scheduling heuristics under register-pressure. In: Proc. of
                     the 13th Int’l Conf. on Modern Circuits and Systems Technologies (MOCAST). Sofia: IEEE, 2024. 1–6. [doi: 10.1109/MOCAST61810.
                     2024.10615463]
                 [25]   Six C, Boulmé S, Monniaux D. Certified and efficient instruction scheduling: Application to interlocked VLIW processors. Proc. of the
                     ACM on Programming Languages, 2020, 4: 129. [doi: 10.1145/3428197]
                 [26]   Six C, Gourdin L, Boulmé S, Monniaux D, Fasse J, Nardino N. Formally verified superblock scheduling. In: Proc. of the 11th ACM
                     SIGPLAN Int’l Conf. on Certified Programs and Proofs. Philadelphia: ACM, 2022. 40–54. [doi: 10.1145/3497775.3503679]
                 [27]   Yang  ZT,  Shirako  J,  Sarkar  V.  Fully  Verified  Instruction  Scheduling.  Proc.  of  the  ACM  on  Programming  Languages,  2024,
                     8(OOPSLA2): 791–816. [doi: 10.1145/3689739]
                 [28]   Herklotz Y, Wickerson J. Hyperblock scheduling for verified high-level synthesis. Proc. of the ACM on Programming Languages, 2024,
                     8(PLDI): 1929–1953. [doi: 10.1145/3656455]
                 [29]   Zhou ZX, He H, Zhang YJ, Yang X, Sun YH. Two-dimensional force-directed cluster scheduling algorithm for the clustered VLIW
                     architecture. Journal of Tsinghua University (Science and Technology), 2008, 48(10): 1643–1646 (in Chinese with English abstract).
                 [30]   Desoli G. Instruction assignment for clustered VLIW DSP compilers: A new approach. Palo Alto: Hewlett Packard Laboratories, 1998.
                 [31]   Porpodas V, Cintra M. CAeSaR: Unified cluster-assignment scheduling and communication reuse for clustered VLIW processors. In:
                     Proc. of the 2013 Int’l Conf. on Compilers, Architecture and Synthesis for Embedded Systems (CASES). Montreal: IEEE, 2013. 1–10.
                     [doi: 10.1109/CASES.2013.6662513]
                 [32]   Park JCH, Schlansker M. On predicated execution. Palo Alto: Hewlett-Packard Laboratories, 1991.
                 [33]   Traber A, Zaruba F, Stucki S, Pullini A, Haugou G, Flamand E, Gürkaynak FK, Benini L. PULPino: A small single-core RISC-V SoC.
                     In: Proc. of the 3rd RISC-V Workshop. 2016.
                 [34]   RISCV-Collab/RISCV-GNU-toolchain. 2024. https://github.com/riscv-collab/riscv-gnu-toolchain
                 [35]   The LLVM Compiler Infrastructure. 2024. https://github.com/llvm/llvm-project
                 [36]   Spike RISC-V ISA Simulator. 2024. https://github.com/riscv-software-src/riscv-isa-sim
                 [37]   Bellard F. QEMU, a fast and portable dynamic translator. In: Proc. of the 2005 USENIX Annual Technical Conf. Anaheim: USENIX
                     Association, 2005. 41–46.
                 [38]   Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K,
                     Shoaib M, Vaish N, Hill MD, Wood DA. The gem5 simulator. ACM SIGARCH Computer Architecture News, 2011, 39(2): 1–7. [doi: 10.
                     1145/2024716.2024718]
                 [39]   CoreMark  is  an  industry-standard  benchmark  that  measures  the  performance  of  central  processing  units  (CPU)  and  embedded
   36   37   38   39   40   41   42   43   44   45   46