Page 293 - 《软件学报》2025年第12期

P. 293

软件学报 ISSN 1000-9825, CODEN RUXUEW E-mail: jos@iscas.ac.cn
2025,36(12):5674−5694 [doi: 10.13328/j.cnki.jos.007403] [CSTR: 32375.14.jos.007403] http://www.jos.org.cn
©中国科学院软件研究所版权所有. Tel: +86-10-62562563

*
基于嵌入模型的知识图谱准确性评估

张明韬 1,2 , 杨国利 1 , 白晓颖 1

1
(北京大数据先进技术研究院, 北京 100195)
2
(北京大学计算机学院, 北京 100871)
通信作者: 杨国利, E-mail: yanggl@aibd.ac.cn; 白晓颖, E-mail: baixy@aibd.ac.cn

摘要: 知识图谱构造常面临三元组错误或缺失等质量问题, 准确性评估是选择和优化知识图谱的基础, 对提升下
游应用的可信性具有重要意义. 引入嵌入模型, 降低对人工标注数据的依赖性, 提升大规模数据处理效率. 将三元
组正误判定转化为无标注的自动化阈值选择问题, 提出了 3 种阈值选择策略, 增强评估的鲁棒性. 提出结合三元组
重要性的评估方法, 从网络结构和关系语义两方面定义重要性指标, 对关键结构、频繁访问的三元组赋予更高关
注度. 从嵌入模型表征能力、知识图谱稠密度、三元组重要性计算方式等多个角度, 分析比较了对评估方法性能
的影响. 实验表明, 相比现有知识图谱准确性的自动化评估方法, 在零样本条件下, 所提出的方法可有效降低评估
误差, 平均降低接近 30%, 在错误率较高、稠密图谱的数据集上效果尤为显著.
关键词: 知识图谱; 准确性评估; 嵌入模型
中图法分类号: TP182

中文引用格式: 张明韬, 杨国利, 白晓颖. 基于嵌入模型的知识图谱准确性评估. 软件学报, 2025, 36(12): 5674–5694. http://www.jos.
org.cn/1000-9825/7403.htm
英文引用格式: Zhang MT, Yang GL, Bai XY. Knowledge Graph Accuracy Evaluation Using Embedding Model. Ruan Jian Xue
Bao/Journal of Software, 2025, 36(12): 5674–5694 (in Chinese). http://www.jos.org.cn/1000-9825/7403.htm

Knowledge Graph Accuracy Evaluation Using Embedding Model
1,2
1
ZHANG Ming-Tao , YANG Guo-Li , BAI Xiao-Ying 1
1
(Advanced Institute of Big Data, Beijing, Beijing 100195, China)
2
(School of Computer Science, Peking University, Beijing 100871, China)
Abstract: Quality issues, such as errors or deficiencies in triplets, become increasingly prominent in knowledge graphs, severely affecting
the credibility of downstream applications. Accuracy evaluation is crucial for building confidence in the use and optimization of knowledge
graphs. An embedding-model-based method is proposed to reduce reliance on manually labeled data and to achieve scalable automatic
evaluation. Triplet verification is formulated as an automated threshold selection problem, with three threshold selection strategies proposed
to enhance the robustness of the evaluation. In addition, triplet importance indicators are incorporated to place greater emphasis on critical
triplets, with importance scores defined based on network structure and relationship semantics. Experiments are conducted to analyze and
compare the impact on performance from various perspectives, such as embedding model capacity, knowledge graph sparsity, and triplet
importance definition. The results demonstrate that, compared to existing automated evaluation methods, the proposed method can
significantly reduce evaluation errors by nearly 30% in zero-shot conditions, particularly on datasets of dense graphs with high error rates.
Key words: knowledge graph; accuracy evaluation; embedding model

知识图谱广泛应用于智能搜索、机器学习、自动问答、知识推理等领域, 例如 IBM Waston 进行智能问答时
[1]
[2]
[3]
利用了 YAGO 知识图谱与 DBpedia 知识库, CMU 设计 NELL 系统收集网络信息进行知识整合与推理, Google

* 基金项目: 国家自然科学基金 (72201275)
收稿时间: 2024-07-13; 修改时间: 2024-09-26; 采用时间: 2025-02-10; jos 在线出版时间: 2025-06-18
CNKI 网络首发时间: 2025-06-19

288 289 290 291 292 293 294 295 296 297 298