Page 109 - 《软件学报》2024年第6期
P. 109

张洪滨 等: AutoConfig: 面向深度学习编译优化的自动配置机制                                            2685


                     of the 40th Int’l Conf. on Machine Learning. Honolulu: ACM, 2023. 1182. [doi: 10.5555/3618408.3619590]
                  [3]  Ouyang L, Wu J, Jiang X, Almeida D, Wainwright CL, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton
                     F, Miller L, Simens M, Askell A, Welinder P, Christiano PF, Leike J, Lowe R. Training language models to follow instructions with
                     human  feedback.  In:  Proc.  of  the  36th  Int’l  Conf.  on  Neural  Information  Processing  Systems.  New  Orleans:  NeurIPS,  2022.
                     27730–27744.
                  [4]  Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. In: Proc. of the 34th Int’l Conf. on Neural Information
                     Processing Systems. Vancouver: ACM, 2020. 159. [doi: 10.5555/3495724.3495883]
                  [5]  Chetlur S, Woolley C, Vandermersch P, Cohen J, Tran J, Catanzaro B, Shelhamer E. CuDNN: Efficient primitives for deep learning.
                     arXiv:1410.0759, 2014.
                  [6]  Li JH, Qin ZN, Mei YJ, Cui JZ, Song YF, Chen CY, Zhang YF, Du LS, Cheng XH, Jin BH, Ye J, Lin E, Lavery D. OneDNN graph
                     compiler: A hybrid approach for high-performance deep learning compilation. arXiv:2301.01333, 2023.
                  [7]  Khan J, Fultz P, Tamazov A, Lowell D, Liu C, Melesse M, Nandhimandalam M, Nasyrov K, Perminov I, Shah T, Filippov V, Zhang J,
                     Zhou J, Natarajan B, Daga M. MIOpen: An open source library for deep learning primitives. arXiv:1910.00078, 2019.
                  [8]  Lattner C, Adve V. LLVM: A compilation framework for lifelong program analysis & transformation. In: Proc. of the 2004 Int’l Symp.
                     on Code Generation and Optimization. San Jose: IEEE, 2004. 75–86. [doi: 10.1109/CGO.2004.1281665]
                  [9]  Zhang HB, Xing MJ, Wu YJ, Zhao C. Compiler technologies in deep learning co-design: A survey. Intelligent Computing, 2023, 2: 0040.
                     [doi: 10.34133/icomputing.0040]
                 [10]  Chen TQ, Moreau T, Jiang ZH, Zheng LM, Yan E, Cowan M, Shen HC, Wang LY, Hu YW, Ceze L, Guestrin C, Krishnamurthy A.
                     TVM:  An  automated  end-to-end  optimizing  compiler  for  deep  learning.  In:  Proc.  of  the  13th  USENIX  Conf.  on  Operating  Systems
                     Design and Implementation. Carlsbad: ACM, 2018. 579–594. [doi: 10.5555/3291168.3291211]
                 [11]  Jouppi NP, Young C, Patil N, et al. In-datacenter performance analysis of a tensor processing unit. In: Proc. of the 44th ACM/IEEE
                     Annual  Int’l  Symp.  on  Computer  Architecture.  Toronto:  Association  for  Computing  Machinery,  2017.  1–12.  [doi: 10.1145/3079856.
                     3080246]
                 [12]  Chen TQ, Zheng LM, Yan E, Jiang ZH, Moreau T, Ceze L, Guestrin C, Krishnamurthy A. Learning to optimize tensor programs. In:
                     Proc. of the 32nd Int’l Conf. on Neural Information Processing Systems. Montreal: ACM, 2018. 3393–3404. [doi: 10.5555/3327144.
                     3327258]
                 [13]  Zheng LM, Jia CF, Sun MM, Wu Z, Yu CH, Haj-Ali A, Wang YD, Yang J, Zhuo DY, Sen K, Gonzalez JE, Stoica I. Ansor: Generating
                     high-performance  tensor  programs  for  deep  learning.  In:  Proc.  of  the  14th  USENIX  Conf.  on  Operating  Systems  Design  and
                     Implementation. ACM, 2020. 49. [doi: 10.5555/3488766.3488815]
                 [14]  Zhu H, Wu R, Diao Y, Ke S, Li H, Zhang C, Xue J, Ma L, Xia Y, Cui W, Yang F, Yang M, Zhou L, Cidon A, Pekhimenko G. ROLLER:
                     Fast  and  efficient  tensor  compilation  for  deep  learning.  In:  Proc.  of  the  16th  USENIX  Symp.  on  Operating  Systems  Design  and
                     Implementation. Carlsbad: USENIX, 2022. 233–248.
                 [15]  Zheng NX, Lin B, Zhang QL, Ma LX, Yang YQ, Yang F, Wang Y, Yang M, Zhou LD. SparTA: Deep-learning model sparsity via tensor-
                     with-sparsity-attribute.  In:  Proc.  of  the  16th  USENIX  Symp.  on  Operating  Systems  Design  and  Implementation.  Carlsbad:  USENIX,
                     2022. 213–232.
                 [16]  Lattner C, Amini M, Bondhugula U, Cohen A, Davis A, Pienaar J, Riddle R, Shpeisman T, Vasilache N, Zinenko O. MLIR: Scaling
                     compiler  infrastructure  for  domain  specific  computation.  In:  Proc.  of  the  2021  IEEE/ACM  Int’l  Symp.  on  Code  Generation  and
                     Optimization (CGO). Seoul: IEEE, 2021. 2–14. [doi: 10.1109/CGO51591.2021.9370308]
                 [17]  Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proc. of the
                     31st Int’l Conf. on Neural Information Processing Systems. Long Beach: ACM, 2017. 6000–6010. [doi: 10.5555/3295222.3295349]
                 [18]  Kim S, Hooper C, Wattanawong T, Kang M, Yan RH, Genc H, Dinh G, Huang QJ, Keutzer K, Mahoney MW, Shao YS, Gholami A. Full
                     stack optimization of transformer inference: A survey. arXiv:2302.14017, 2023.
                 [19]  Stothers AJ. On the complexity of matrix multiplication [Ph.D. Thesis]. Edinburgh: The University of Edinburgh, 2010.
                 [20]  Cong  J,  Xiao  BJ.  Minimizing  computation  in  convolutional  neural  networks.  In:  Proc.  of  the  24th  Int’l  Conf.  on  Artificial  Neural
                     Networks and Machine Learning. Hamburg: Springer, 2014. 281–290. [doi: 10.1007/978-3-319-11179-7_36]
                 [21]  Lavin A, Gray S. Fast algorithms for convolutional neural networks. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern
                     Recognition (CVPR). Las Vegas: IEEE, 2016. 4013–4021. [doi: 10.1109/CVPR.2016.435]
                 [22]  Chellapilla K, Puri S, Simard P. High performance convolutional neural networks for document processing. In: Proc. of the 10th Int’l
                     Workshop on Frontiers in Handwriting Recognition. La Baule: University of Rennes, 2006.
                 [23]  Li MZ, Liu Y, Liu XY, Sun QX, You X, Yang HL, Luan ZZ, Gan L, Yang GW, Qian DP. The deep learning compiler: A comprehensive
   104   105   106   107   108   109   110   111   112   113   114