Page 131 - 《软件学报》2020年第12期
P. 131

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
         Journal of Software,2020,31(12):3797−3807 [doi: 10.13328/j.cnki.jos.005889]   http://www.jos.org.cn
         ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563


                                                              ∗
         融合句法解析树的汉-越卷积神经机器翻译

                                  1,2
               1,2
                                                   1,2
                         1,2
                                           2
         王振晗 ,   何建雅琳 ,   余正涛 ,   文永华 ,   郭军军 ,   高盛祥      1,2
         1
          (昆明理工大学  信息工程与自动化学院,云南  昆明  650500)
         2 (云南省人工智能重点实验室(昆明理工大学),云南  昆明  650500)
         通讯作者:  余正涛, E-mail: ztyu@hotmail.com

         摘   要:  神经机器翻译是目前应用最广泛的机器翻译方法,在语料资源丰富的语种上取得了良好的效果.但是在
         汉语-越南语这类缺乏双语数据的语种上表现不佳.考虑汉语和越南语在语法结构上的差异性,提出一种融合源语
         言句法解析树的汉越神经机器翻译方法,利用深度优先遍历得到源语言的句法解析树的向量化表示,将句法向量与
         源语言词嵌入相加作为输入,训练翻译模型.在汉-越语言对上进行了实验,相较于基准系统,获得了 0.6 个 BLUE 值的
         提高.实验结果表明,融合句法解析树可以有效提高在资源稀缺情况下机器翻译模型的性能.
         关键词:  神经机器翻译;资源稀缺;句法解析树
         中图法分类号: TP18


         中文引用格式:  王振晗,何建雅琳,余正涛,文永华,郭军军,高盛祥.融合句法解析树的汉-越卷积神经机器翻译.软件学报,2020,
         31(12):3797−3807. http://www.jos.org.cn/1000-9825/5889.htm
         英文引用格式: Wang ZH, He JYL, Yu ZT, Wen YH, Guo JJ,    Gao SX.  Chinese-Vietnamese convolutional neural  machine
         translation with incorporating syntactic  parsing  tree.  Ruan Jian  Xue  Bao/Journal of Software, 2020,31(12):3797−3807 (in
         Chinese). http://www.jos.org.cn/1000-9825/5889.htm

         Chinese-Vietnamese Convolutional Neural Machine Translation with Incorporating Syntactic
         Parsing Tree

                                    1,2
                      1,2
                                                                 2
                                                                              1,2
                                                  1,2
         WANG Zhen-Han ,  HE Jian-Ya-Lin ,  YU Zheng-Tao ,   WEN Yong-Hua ,  GUO Jun-Jun ,   GAO Sheng-Xiang 1,2
         1
          (School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China)
         2
          (Yunnan Key Laboratory of Artificial Intelligence (Kunming University of Science and Technology), Kunming 650500, China)
         Abstract:    Neural  machine translation is  the  most widely used  machine translation  method at present,  and  has sound performance  in
         languages with rich corpus resources. However, it does not work well in languages that lack of bilingual data, such as Chinese-Vietnamese.
         Taking the difference in grammatical structure between different languages into  consideration, this study proposes  a neural  machine
         translation method that incorporates syntactic parse tree. In this method, a depth-first search is used to obtain the vectorized representation
         of the syntactic parse tree of the source language, and the translation model is trained by embedding the obtained vectors and the source
         language embedding  as  inputs. This method is  implemented  on Chinese-Vietnamese,  language  pair and achieves  0.6  BLUE values
         improvement compared to the baseline system. This experiment shows that the incorporating syntax parse tree can effectively improve the
         performance of the machine translation model under the resource scarcity.
         Key words:    neural machine translation; low-resource; syntactic parse tree



            ∗  基金项目:  国家自然科学基金(61732005, 61672271, 61761026, 61866020);  云南省自然科学基金(2018FB04);  云南省省级人才
         培养计划项目(KKSY201703005, KKSY201703015)
              Foundation item:  National  Natural Science Foundation of China  (61732005, 61672271, 61761026, 61866020); National  Natural
         Science  Foundation  of Yunnan  Province  (2018FB04); Personal Training Project  of  the Yunnan  Science and Technology Department
         (KKSY201703005, KKSY201703015)
              收稿时间: 2019-04-24;  修改时间: 2019-06-05, 2019-07-20;  采用时间: 2019-09-09
   126   127   128   129   130   131   132   133   134   135   136