Page 243 - 《软件学报》2025年第9期

P. 243

4154 软件学报 2025 年第 36 卷第 9 期

commonsense event triples encompassing causal, temporal, conditional, and other common event relationships. Although the constructed
ECKG holds considerable value, its limited scale curtails practical applications. Besides, large-scale event commonsense knowledge graphs
are rare in current studies. To overcome these challenges, this paper uses large language models from the GPT series to expand the above-
mentioned three event relationships and sub-events of the proposed ECKG. The expansion method involves three primary steps. Firstly,
specific prompts for event knowledge (ek-prompts) are designed by combining the events in the ECKG with four relationships, and GPT-4-
Turbo is used to generate corresponding event triples. Secondly, the triples of the ECKG are integrated with accurate triples obtained by
ek-prompts to create a specialized dataset. Additionally, GPT-3.5-Turbo is fine-tuned on the dataset to generate more specific event triples
and validate the accuracy of new triples. Lastly, by analyzing the similarities among events in the ECKG and implementing an event-
sharing mechanism, similar events within the same relationship are interconnected, ensuring consistency across similar event triples.
Experimental results show that the newly acquired triples are of high quality, particularly those of the temporal relationships, with an
accuracy rate of 98.2%. Ultimately, the proposed expansion method appends 2 433 012 commonsense event triples to the original ECKG,
significantly expanding its scale and providing more commonsense knowledge for many applications in artificial intelligence.
Key words: event commonsense knowledge graph; large language model (LLM); fine-tuning technique; event triple; event-sharing mechanism
常识知识是大多数人共同认可的关于世界的真实描述, 如“苹果是一种水果”“走路时手臂会摆动”以及“人被
赞美时通常会感到开心”等. 这类知识广泛应用于自动问答 [1] 、计算机视觉 [2] 和情感分析 [3] 等人工智能领域. 由于
常识知识具有共享性 (人们共同认知的信息)、隐含性 (通常不会在文本明确表达) 和广泛性 (遍布各个学科领域),
使得获取常识知识成为一项具有挑战性的任务.
[5]
[6]
目前, 许多著名的大型知识图谱主要关注实体相关的知识, 比如 YAGO 、DBpedia 和 Wikidata 等. 然而,
[4]
以事件为中心的知识图谱通过将事件与相关实体、时间和空间等因素关联起来, 在自然语言处理 [7] 、问答系统 [8]
和事件预测 [9] 等领域起到了至关重要的作用. 但是, 与事件相关的大规模的知识图谱较为稀缺. 例如, ATOMIC [10]
专注于事件间的因果关系, 却未涉及其他类型的事件关系; 而 ConceptNet [11] 虽然提供了丰富的实体相关常识知识,
但事件相关的常识知识较为有限.
在早期阶段, 本课题组投入大量的时间及人力资源, 从各种资源中筛选和整理出一定量的事件知识. 经过多次
严格的修正和校对, 最终构建成一个包含 26 606 个事件三元组的高质量中文种子事件常识知识图谱 (event
commonsense knowledge graph, ECKG) [12−15] . 该种子图谱涵盖了因果、时序和条件等多种常见的事件关系, 其中最
为基本和主要的关系类型包括因果、时序、条件以及子事件关系. 尽管 ECKG 在精度和实用性上具有显著优势,
但规模及知识覆盖的局限性, 影响了其在实际应用中的广泛性.
此外, 我们注意到现有获取常识知识的方法包括人工 [10,13,16−18] 、自动化 [1,19−22] 和半自动化 [11,23−25] 过程, 这些方
法通常需要复杂的数据预处理和大量人力. 同时, 在常识知识图谱补全方面, 尽管已有研究通过规则、嵌入及神经
网络等技术来提高补全的准确性并扩展知识的深度 [26−33] , 但依然面临数据稀缺和质量不高的问题. 此外, LLM
(large language model) 的出现为知识图谱补全带来新的机遇. 已有研究表明 LLM 在补全与实体相关的知识方面显
示出优势 [34−40] , 然而在事件常识知识图谱领域的研究相对较少.
针对上述的问题, 本文提出了对种子 ECKG 进行扩展. 主要是增强图谱中的因果、时序、条件和子事件这 4
种事件关系的知识库, 通过增加每个事件 (E) 在关系 R 中的头三元组 (<?, R, E>) 和尾三元组 (<E, R, ?>), 以扩展
整个 ECKG 的规模, 同时保证高精度.
近年来, OpenAI 发布了 GPT-3.5 和 GPT-4 等一系列先进的大语言模型 [41] . 这些模型基于庞大的预训练数据
集, 具有丰富的内部知识, 能够广泛应用于文本生成、问答系统和知识图谱补全等多种任务. 研究表明, GPT-3.5
和 GPT-4 具有强大的学习能力和快速的反应速度, 可以通过简单的提示或微调来有效地生成知识. 基于这些优势,
本文采用 GPT 系列 LLM 来扩展种子 ECKG. 具体地, 本文将使用 GPT-3.5-Turbo 和 GPT-4-Turbo 模型的强大知
识库, 全面增强种子 ECKG 的 4 种事件关系的三元组. 本文的主要贡献如下.
(1) 针对每种事件关系设计特定的事件知识提示 (event knowledge prompt, ek-prompt), 并利用 GPT-4-Turbo 模
型生成新的事件三元组.
(2) 结合种子 ECKG 和通过 ek-prompt 生成的事件三元组, 为每种事件关系构建特定的数据集, 并用于微调

238 239 240 241 242 243 244 245 246 247 248