Page 109 - 《软件学报》2024年第6期
P. 109
张洪滨 等: AutoConfig: 面向深度学习编译优化的自动配置机制 2685
of the 40th Int’l Conf. on Machine Learning. Honolulu: ACM, 2023. 1182. [doi: 10.5555/3618408.3619590]
[3] Ouyang L, Wu J, Jiang X, Almeida D, Wainwright CL, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton
F, Miller L, Simens M, Askell A, Welinder P, Christiano PF, Leike J, Lowe R. Training language models to follow instructions with
human feedback. In: Proc. of the 36th Int’l Conf. on Neural Information Processing Systems. New Orleans: NeurIPS, 2022.
27730–27744.
[4] Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. In: Proc. of the 34th Int’l Conf. on Neural Information
Processing Systems. Vancouver: ACM, 2020. 159. [doi: 10.5555/3495724.3495883]
[5] Chetlur S, Woolley C, Vandermersch P, Cohen J, Tran J, Catanzaro B, Shelhamer E. CuDNN: Efficient primitives for deep learning.
arXiv:1410.0759, 2014.
[6] Li JH, Qin ZN, Mei YJ, Cui JZ, Song YF, Chen CY, Zhang YF, Du LS, Cheng XH, Jin BH, Ye J, Lin E, Lavery D. OneDNN graph
compiler: A hybrid approach for high-performance deep learning compilation. arXiv:2301.01333, 2023.
[7] Khan J, Fultz P, Tamazov A, Lowell D, Liu C, Melesse M, Nandhimandalam M, Nasyrov K, Perminov I, Shah T, Filippov V, Zhang J,
Zhou J, Natarajan B, Daga M. MIOpen: An open source library for deep learning primitives. arXiv:1910.00078, 2019.
[8] Lattner C, Adve V. LLVM: A compilation framework for lifelong program analysis & transformation. In: Proc. of the 2004 Int’l Symp.
on Code Generation and Optimization. San Jose: IEEE, 2004. 75–86. [doi: 10.1109/CGO.2004.1281665]
[9] Zhang HB, Xing MJ, Wu YJ, Zhao C. Compiler technologies in deep learning co-design: A survey. Intelligent Computing, 2023, 2: 0040.
[doi: 10.34133/icomputing.0040]
[10] Chen TQ, Moreau T, Jiang ZH, Zheng LM, Yan E, Cowan M, Shen HC, Wang LY, Hu YW, Ceze L, Guestrin C, Krishnamurthy A.
TVM: An automated end-to-end optimizing compiler for deep learning. In: Proc. of the 13th USENIX Conf. on Operating Systems
Design and Implementation. Carlsbad: ACM, 2018. 579–594. [doi: 10.5555/3291168.3291211]
[11] Jouppi NP, Young C, Patil N, et al. In-datacenter performance analysis of a tensor processing unit. In: Proc. of the 44th ACM/IEEE
Annual Int’l Symp. on Computer Architecture. Toronto: Association for Computing Machinery, 2017. 1–12. [doi: 10.1145/3079856.
3080246]
[12] Chen TQ, Zheng LM, Yan E, Jiang ZH, Moreau T, Ceze L, Guestrin C, Krishnamurthy A. Learning to optimize tensor programs. In:
Proc. of the 32nd Int’l Conf. on Neural Information Processing Systems. Montreal: ACM, 2018. 3393–3404. [doi: 10.5555/3327144.
3327258]
[13] Zheng LM, Jia CF, Sun MM, Wu Z, Yu CH, Haj-Ali A, Wang YD, Yang J, Zhuo DY, Sen K, Gonzalez JE, Stoica I. Ansor: Generating
high-performance tensor programs for deep learning. In: Proc. of the 14th USENIX Conf. on Operating Systems Design and
Implementation. ACM, 2020. 49. [doi: 10.5555/3488766.3488815]
[14] Zhu H, Wu R, Diao Y, Ke S, Li H, Zhang C, Xue J, Ma L, Xia Y, Cui W, Yang F, Yang M, Zhou L, Cidon A, Pekhimenko G. ROLLER:
Fast and efficient tensor compilation for deep learning. In: Proc. of the 16th USENIX Symp. on Operating Systems Design and
Implementation. Carlsbad: USENIX, 2022. 233–248.
[15] Zheng NX, Lin B, Zhang QL, Ma LX, Yang YQ, Yang F, Wang Y, Yang M, Zhou LD. SparTA: Deep-learning model sparsity via tensor-
with-sparsity-attribute. In: Proc. of the 16th USENIX Symp. on Operating Systems Design and Implementation. Carlsbad: USENIX,
2022. 213–232.
[16] Lattner C, Amini M, Bondhugula U, Cohen A, Davis A, Pienaar J, Riddle R, Shpeisman T, Vasilache N, Zinenko O. MLIR: Scaling
compiler infrastructure for domain specific computation. In: Proc. of the 2021 IEEE/ACM Int’l Symp. on Code Generation and
Optimization (CGO). Seoul: IEEE, 2021. 2–14. [doi: 10.1109/CGO51591.2021.9370308]
[17] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proc. of the
31st Int’l Conf. on Neural Information Processing Systems. Long Beach: ACM, 2017. 6000–6010. [doi: 10.5555/3295222.3295349]
[18] Kim S, Hooper C, Wattanawong T, Kang M, Yan RH, Genc H, Dinh G, Huang QJ, Keutzer K, Mahoney MW, Shao YS, Gholami A. Full
stack optimization of transformer inference: A survey. arXiv:2302.14017, 2023.
[19] Stothers AJ. On the complexity of matrix multiplication [Ph.D. Thesis]. Edinburgh: The University of Edinburgh, 2010.
[20] Cong J, Xiao BJ. Minimizing computation in convolutional neural networks. In: Proc. of the 24th Int’l Conf. on Artificial Neural
Networks and Machine Learning. Hamburg: Springer, 2014. 281–290. [doi: 10.1007/978-3-319-11179-7_36]
[21] Lavin A, Gray S. Fast algorithms for convolutional neural networks. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern
Recognition (CVPR). Las Vegas: IEEE, 2016. 4013–4021. [doi: 10.1109/CVPR.2016.435]
[22] Chellapilla K, Puri S, Simard P. High performance convolutional neural networks for document processing. In: Proc. of the 10th Int’l
Workshop on Frontiers in Handwriting Recognition. La Baule: University of Rennes, 2006.
[23] Li MZ, Liu Y, Liu XY, Sun QX, You X, Yang HL, Luan ZZ, Gan L, Yang GW, Qian DP. The deep learning compiler: A comprehensive