Page 106 - 《软件学报》2020年第12期
P. 106
软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software,2020,31(12):3772−3786 [doi: 10.13328/j.cnki.jos.005885] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563
∗
中文文本蕴含类型及语块识别方法研究
于 东, 金天华, 谢婉莹, 张 艺, 荀恩东
(北京语言大学 信息科学学院,北京 100083)
通讯作者: 荀恩东, E-mail: edxun@126.com
摘 要: 文本蕴含识别(RTE)是判断两个句子语义是否具有蕴含关系的任务.近年来英文蕴含识别研究取得了较
大发展,但主要是以类型判断为主,在数据中精确定位蕴含语块的研究比较少,蕴含类型识别的解释性较低.从中文
文本蕴含识别(CNLI)数据中挑选 12 000 个中文蕴含句对,人工标注引起蕴含现象的语块,结合语块的语言学特征分
析归纳了 7 种具体的蕴含类型.在此基础上,将中文蕴含识别任务转化为 7 分类的蕴含类型识别和蕴含语块边界-类
型识别任务,在深度学习模型上达到 69.19%和 62.09%的准确率.实验结果表明,所提出的方法可以有效发现中文蕴
含语块边界及与之对应的蕴含类型,为下一步研究提供了可靠的基准方法.
关键词: 文本蕴含识别;语块识别;蕴含类型;深度学习
中图法分类号: TP18
中文引用格式: 于东,金天华,谢婉莹,张艺,荀恩东.中文文本蕴含类型及语块识别方法研究.软件学报,2020,31(12):3772−3786.
http://www.jos.org.cn/1000-9825/5885.htm
英文引用格式: Yu D, Jin TH, Xie WY, Zhang Y, Xun ED. Recognition method based on deep learning for Chinese textual
entailment chunks and labels. Ruan Jian Xue Bao/Journal of Software, 2020,31(12):3772−3786 (in Chinese). http://www.jos.org.
cn/1000-9825/5885.htm
Recognition Method Based on Deep Learning for Chinese Textual Entailment Chunks and Labels
YU Dong, JIN Tian-Hua, XIE Wan-Ying, ZHANG Yi, XUN En-Dong
(College of Information Science, Beijing Language and Culture University, Beijing 100083, China)
Abstract: Recognizing textual entailment (RTE) is a task to recognize whether two sentences have an entailment relationship. In recent
years, RTE in English had made a great progress. The current researches are mainly based on type judgment, and pay less attention to
locate the language chunks that lead to the entailment relationship. More over, it leads to a low interpretability of the RTE models. This
study selects 12 000 Chinese entailment sentence pairs from the Chinese Natural Language Inference (CNLI) data and labeled chunks
which lead to their entailment relationship. Then 7 entailment types are summarized considering Chinese linguistic features. On the basis,
two tasks are proposed. One is to recognize the seven-category of entailment type for each entailment sentence pairs, another is to
recognize the boundaries of the entailment chunks in it. The proposed deep learning based method reaches an accuracy of 69.19% and
62.09% in the two tasks. The experimental results show that proposed approaches can effectively identifying different types of entailment
in Chinese and find the boundaries of the entailment chunks, which demonstrate that the proposed model provides a reliable benchmark
for further research.
Key words: recognizing textual entailment; chunk labeling; deep learning
人工智能的发展离不开自然语言处理,而深度学习模型的进步,使得机器可以更容易地理解自然语言.自然
语言处理很重要的一点就是实现文本的深度理解,进而在大量文本之间进行语义推理,促进阅读理解、问答系
统、文本摘要等垂直任务的发展.
∗ 基金项目: 国家重点研发计划(2018YFB1005105)
Foundation item: National Key Research and Development Program of China (2018YFB1005105)
收稿时间: 2019-04-02; 修改时间: 2019-06-05; 采用时间: 2019-09-07