Page 307 - 《软件学报》2024年第4期

P. 307

软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
Journal of Software,2024,35(4):1885−1898 [doi: 10.13328/j.cnki.jos.006831] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563

*
一种基于窗口机制的口语理解异构图网络

张启辰, 王帅, 李静梅

(哈尔滨工程大学计算机科学与技术学院, 黑龙江哈尔滨 150001)
通信作者: 张启辰, E-mail: zhangqichen@hrbeu.edu.cn

摘要: 口语理解 (spoken language understanding, SLU) 是面向任务的对话系统的核心组成部分, 旨在提取用户查
询的语义框架. 在对话系统中, 口语理解组件 (SLU) 负责识别用户的请求, 并创建总结用户需求的语义框架, SLU
通常包括两个子任务: 意图检测 (intent detection, ID) 和槽位填充 (slot filling, SF). 意图检测是一个语义话语分类问
题, 在句子层面分析话语的语义; 槽位填充是一个序列标注任务, 在词级层面分析话语的语义. 由于意图和槽之间
的密切相关性, 主流的工作采用联合模型来利用跨任务的共享知识. 但是 ID 和 SF 是两个具有强相关性的不同任
务, 它们分别表征了话语的句级语义信息和词级信息, 这意味着两个任务的信息是异构的, 同时具有不同的粒度.
提出一种用于联合意图检测和槽位填充的异构交互结构, 采用自注意力和图注意力网络的联合形式充分地捕捉两
个相关任务中异构信息的句级语义信息和词级信息之间的关系. 不同于普通的同构结构, 所提模型是一个包含不
同类型节点和连接的异构图架构, 因为异构图涉及更全面的信息和丰富的语义, 同时可以更好地交互表征不同粒
度节点之间的信息. 此外, 为了更好地适应槽标签的局部连续性, 利用窗口机制来准确地表示词级嵌入表示. 同时
结合预训练模型 (BERT), 分析所提出模型应用预训练模型的效果. 所提模型在两个公共数据集上的实验结果表明,
所提模型在意图检测任务上准确率分别达到了 97.98% 和 99.11%, 在槽位填充任务上 F1 分数分别达到 96.10%
和 96.11%, 均优于目前主流的方法.
关键词: 对话系统; 口语理解; 异构图; 窗口机制; 意图检测; 槽位填充
中图法分类号: TP18

中文引用格式: 张启辰, 王帅, 李静梅. 一种基于窗口机制的口语理解异构图网络. 软件学报, 2024, 35(4): 1885–1898. http://www.
jos.org.cn/1000-9825/6831.htm
英文引用格式: Zhang QC, Wang S, Li JM. Heterogeneous Graph Network with Window Mechanism for Spoken Language
Understanding. Ruan Jian Xue Bao/Journal of Software, 2024, 35(4): 1885–1898 (in Chinese). http://www.jos.org.cn/1000-9825/6831.
htm

Heterogeneous Graph Network with Window Mechanism for Spoken Language Understanding

ZHANG Qi-Chen, WANG Shuai, LI Jing-Mei
(College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China)
Abstract: Spoken language understanding (SLU), as a core component of task-oriented dialogue systems, aims to extract the semantic
framework of user queries. In dialogue systems, the SLU component is responsible for identifying user requests and creating a semantic
framework that summarizes user requests. SLU usually includes two subtasks: intent detection (ID) and slot filling (SF). ID is regarded as
a semantic utterance classification problem that analyzes the semantics of utterance at the sentence level, while SF is viewed as a sequence
labeling task that analyzes the semantics of utterance at the word level. Due to the close correlation between intentions and slots,
mainstream works employ joint models to exploit shared knowledge across tasks. However, ID and SF are two different tasks with strong
correlation, and they represent sentence-level semantic information and word-level information of utterances respectively, which means that
the information of the two tasks is heterogeneous and has different granularities. This study proposes a heterogeneous interactive structure

* 收稿时间: 2022-05-09; 修改时间: 2022-08-08, 2022-09-20; 采用时间: 2022-11-03; jos 在线出版时间: 2023-06-14
CNKI 网络首发时间: 2023-06-15

302 303 304 305 306 307 308 309 310 311 312