Page 308 - 《软件学报》2024年第4期

P. 308

1886 软件学报 2024 年第 35 卷第 4 期

for joint ID and SF, which adequately captures the relationship between sentence-level semantic information and word-level information in
heterogeneous information for two correlative tasks by adopting self-attention and graph attention networks. Different from ordinary
homogeneous structures, the proposed model is a heterogeneous graph architecture containing different types of nodes and links because a
heterogeneous graph involves more comprehensive information and rich semantics and can better interactively represent the information
between nodes with different granularities. In addition, this study utilizes a window mechanism to accurately represent word-level
embedding to better accommodate the local continuity of slot labels. Meanwhile, the study uses a pre-trained model (BERT) to analyze the
effect of the proposed model using BERT. The experimental results of the proposed model on two public datasets show that the model
achieves an accuracy of 97.98% and 99.11% on the ID task and an F1 score of 96.10% and 96.11% on the SF task, which are superior to
the current mainstream methods.
Key words: dialogue system; spoken language understanding (SLU); heterogeneous graph; window mechanism; intent detection; slot filling

1 引言

面向任务的对话系统 (task-oriented dialogue system, TOD) 可以处理特定领域中的特定问题, 如智能聊天机器
人、电影票预订等, 其中口语理解 (spoken language understanding, SLU) 是面向任务对话系统中的一个重要组件 [1] .
面向任务的对话系统需要更严格的响应约束, 因为它的目标是根据用户信息进行精确的反馈. 在对话系统中, 口语
理解组件负责识别用户的请求并创建一个简洁概括用户需求的语义框架. 该模块将原始用户消息转换为语义槽,
并对用户意图进行分类. 它通常涉及两个任务: 意图检测 (intent detection, ID) 和槽位填充 (slot filling, SF), 其分别
用于识别用户意图和从自然语言表达中提取语义成分 [2,3] . 意图检测被视为语义话语分类问题, 在句子级别分析话
语的语义, 而槽位填充通常被视为在单词级别 (token-level) 工作的序列标记任务 [4] , 其性能将直接影响下游任务的
决策.
例如, 图 1 为带有意图和槽注释 (BIO 格式) 的 SLU 话语示例, 槽位标签前缀“B-”表示标签是槽的开始, 标签
前的前缀“I-”表示标签在槽内. “O”标记表示其他 [5] . 话语如果检测到意图标签为“Flight”, 则单词“Kansas City”和
“Newark”的槽位信息有可能被识别为“B-fromloc”“I-fromloc”和“B-toloc”. 但如果意图标签被识别为“Ground
service”, 则上述槽位信息更有可能被识别为“B-City name”. 同时, 当槽位标签“B-fromloc”“I-fromloc”和“B-toloc”被
填充时, 我们可以更准确地将意图信息识别为“Flight”而不是“Ground service”. 因此, 这两个任务之间存在很强的
联系, 意图信息对槽位填充任务具有指导意义, 反之亦然.

Utterance Flight From Kansas City To Newark

Slots O O B-fromloc I-fromloc O I-toloc
Intent Flight
图 1 带有意图和槽注释 (BIO 格式) 的 SLU 话语示例

考虑到两个任务之间的显著相关性, 一些研究选择将意图检测和槽位填充任务结合到一个多任务学习框架
中, 共同优化语义特征和共享潜在空间. 部分联合模型通过相互交互以促进意图检测和槽位填充任务的最终准确
预测 [4,6−9] . 这些模型具有显式控制两个任务的知识转移的优势, 可以帮助提高单词的可解释性, 同时有效分析 ID
和 SF 之间的影响 [10] . 尽管这些模型取得了良好的效果, 但是这些模型使用同构结构, 没有考虑不同任务之间的特
征差异. 因为意图检测是作用于整句话的句级语义分析任务, 而槽位填充是针对每个单词的词级任务, 它们所表示
的特征是具有异构性的. 异构性是异构图的内在属性, 即拥有各种类型的节点和边, 不同类型的节点具有不同的特
征, 其特征可能落入不同的特征空间中 [11] . 异构图中的不同边可以提取不同的语义信息. 由于话语的每个词级
token 都是槽位填充任务的特定表示, 而意图检测是每个话语的分类任务, 它的表示是整体的. 同时, 还有一些模型
没有注意到词意表达的局部性, 即在 SLU 中, 槽位不仅由关联项决定, 同时槽位标签“O”和“B-”“I-”具有局部连续
性, 即“O”标签多数情况下呈局部出现, “I-”标签伴随着“B-”标签同时出现. 因此槽位信息会呈现出局部特征.
在本文中, 我们提出了一种异构结构框架来解决上述问题, 称为异构协同交互注意力网络 (heterogeneous co-
interactive self-attention and graph attention network, HcoSG), 该模型是非自回归和协同交互的, 异构模型的核心采

303 304 305 306 307 308 309 310 311 312 313