Page 93 - 《软件学报》2025年第9期

P. 93

4004 软件学报 2025 年第 36 卷第 9 期

扩展平台的优化实现. 随着其所支持的硬件平台数量和向量化算子数量的增加, 在 GGML 中引入可变长的硬件抽
象层将有助于避免算子面向不同硬件平台的重复实现, 降低开发复杂度, 提高算法库的可维护性. 深度学习算法
库 PyTorch 和线性代数算法库 Eigen 也都计划将 RISC-V 向量扩展引入其现有的硬件抽象层中, 应用本文所述方
法可以帮助算法库更灵活地设计和实现兼容现有定长平台与可变长平台的硬件抽象层, 从而更好地实现 RISC-V
向量扩展后端, 提高算法库在 RISC-V 平台上的性能表现. 整合更多的 SIMD 或向量扩展到硬件抽象层中, 特别是
支持如 RISC-V P 扩展等新兴设备平台, 可以增强硬件抽象层的功能性和适用性, 对于促进 RISC-V 软件生态发展
具有显著意义.

致谢感谢 OpenCV 社区维护者 Vadim Pisarevsky, Alexander Smorkalov 和 Maksim Shabunin 对本工作的建议和
帮助.

References:
[1] Luebke D. CUDA: Scalable parallel programming for high-performance scientific computing. In: Proc. of the 5th IEEE Int’l Symp. on
Biomedical Imaging: From Nano to Macro. Paris: IEEE, 2008. 836–838. [doi:10.1109/ISBI.2008.4541126]
[2] Munshi A. The OpenCL specification. In: Proc. of the 2009 IEEE Hot Chips 21 Symp. (HCS). Stanford: IEEE, 2009. 1–314. [doi: 10.
1109/HOTCHIPS.2009.7478342]
[3] Lomont C. Introduction to Intel advanced vector extensions. Intel White Paper, 2011, 23: 1–21.
[4] Stephens N, Biles S, Boettcher M, Eapen J, Eyole M, Gabrielli G, Horsnell M, Magklis G, Martinez A, Premillieu N, Reid A, Rico A,
Walker P. The ARM scalable vector extension. IEEE Micro, 2017, 37(2): 26–39. [doi: 10.1109/MM.2017.35]
[5] Hu WW, Wang WX, Wu RY, Wang HD, Zeng L, Xu CH, Gao X, Zhang FX. Loongson instruction set architecture technology. Journal of
Computer Research and Development, 2023, 60(1): 2–16 (in Chinese with English abstract). [doi: 10.7544/issn1000-1239.202220196]
[6] Liu C, Wu YJ, Wu JZ, Zhao C. Survey on RISC-V system architecture research. Ruan Jian Xue Bao/Journal of Software, 2021, 32(12):
3992–4024 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6490.htm [doi: 10.13328/j.cnki.jos.006490]
[7] Bradski G, Kaehler A. Learning OpenCV: Computer Vision with the OpenCV library. O’Reilly Media, Inc., 2008.
[8] Universal intrinsics. 2024. https://docs.opencv.org/4.x/df/d91/group__core__hal__intrin.html
[9] riscv/riscv-v-spec: Working draft of the proposed RISC-V V vector extension. 2024. https://github.com/riscv/riscv-v-spec
[10] Feng J K, He Y P, Tao Q M. Auto-vectorization: Recent development and prospect. Journal of Communications, 2022, 43(3): 180–195
(in Chinese with English abstract). [doi: 10.11959/j.issn.1000-436x.2022051]
[11] Ragan-Kelley J, Barnes C, Adams A, Paris S, Durand F, Amarasinghe S. Halide: A language and compiler for optimizing parallelism,
locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices, 2013, 48(6): 519–530. [doi: 10.1145/2499370.
2462176]
[12] Kretz M. Extending C++ for explicit data-parallel programming via SIMD vector types [Ph.D. Thesis]. Frankfurt am Main: der Johann
Wolfgang Goethe-Universität, 2015. [doi: 10.13140/RG.2.1.2355.4323]
[13] Highway: About performance-portable, length-agnostic SIMD with runtime dispatch. 2024. https://github.com/google/highway
[14] Ji SL, Wang QY, Chen AY, Zhao BB, Ye T, Zhang XH, Wu JZ, Li J, Yin JW, Wu YJ. Survey on open-source software supply chain
security. Ruan Jian Xue Bao/Journal of Software, 2023, 34(3): 1330–1364 (in Chinese with English abstract). http://www.jos.org.cn/1000-
9825/6717.htm [doi: 10.13328/j.cnki.jos.006717]
[15] libjpeg-turbo. A JPEG image codec that uses SIMD instructions to accelerate baseline JPEG compression and decompression. 2024. http://
sourceforge.net/projects/libjpeg-turbo
[16] Genc H, Kim S, Amid A, Haj-Ali A, Iyer V, Prakash P, Zhao J, Grubb D, Liew H, Mao H, Ou A, Schmidt C, Steffl S, Wright J, Stoica I,
Ragan-Kelley J, Asanovic K, Nikolic B, Shao YS. Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack
integration. In: Proc. of the 58th ACM/IEEE Design Automation Conf. (DAC). San Francisco: IEEE, 2021. 769–774. [doi: 10.1109/
DAC18074.2021.9586216]
[17] Li RS, Peng P, Shao ZY, Jin H, Zheng R. Evaluating RISC-V vector instruction set architecture extension with computer vision
workloads. Journal of Computer Science and Technology, 2023, 38(4): 807–820. [doi: 10.1007/s11390-023-1266-6]

附中文参考文献:
[5] 胡伟武, 汪文祥, 吴瑞阳, 王焕东, 曾露, 徐成华, 高翔, 张福新. 龙芯指令系统架构技术. 计算机研究与发展, 2023, 60(1): 2–16. [doi:

88 89 90 91 92 93 94 95 96 97 98