Page 293 - 《软件学报》2025年第12期
P. 293

软件学报 ISSN 1000-9825, CODEN RUXUEW                                        E-mail: jos@iscas.ac.cn
                 2025,36(12):5674−5694 [doi: 10.13328/j.cnki.jos.007403] [CSTR: 32375.14.jos.007403]  http://www.jos.org.cn
                 ©中国科学院软件研究所版权所有.                                                          Tel: +86-10-62562563



                                                              *
                 基于嵌入模型的知识图谱准确性评估

                 张明韬  1,2 ,    杨国利  1 ,    白晓颖  1


                 1
                  (北京大数据先进技术研究院, 北京 100195)
                 2
                  (北京大学 计算机学院, 北京 100871)
                 通信作者: 杨国利, E-mail: yanggl@aibd.ac.cn; 白晓颖, E-mail: baixy@aibd.ac.cn

                 摘 要: 知识图谱构造常面临三元组错误或缺失等质量问题, 准确性评估是选择和优化知识图谱的基础, 对提升下
                 游应用的可信性具有重要意义. 引入嵌入模型, 降低对人工标注数据的依赖性, 提升大规模数据处理效率. 将三元
                 组正误判定转化为无标注的自动化阈值选择问题, 提出了                   3  种阈值选择策略, 增强评估的鲁棒性. 提出结合三元组
                 重要性的评估方法, 从网络结构和关系语义两方面定义重要性指标, 对关键结构、频繁访问的三元组赋予更高关
                 注度. 从嵌入模型表征能力、知识图谱稠密度、三元组重要性计算方式等多个角度, 分析比较了对评估方法性能
                 的影响. 实验表明, 相比现有知识图谱准确性的自动化评估方法, 在零样本条件下, 所提出的方法可有效降低评估
                 误差, 平均降低接近      30%, 在错误率较高、稠密图谱的数据集上效果尤为显著.
                 关键词: 知识图谱; 准确性评估; 嵌入模型
                 中图法分类号: TP182

                 中文引用格式: 张明韬, 杨国利, 白晓颖. 基于嵌入模型的知识图谱准确性评估. 软件学报, 2025, 36(12): 5674–5694. http://www.jos.
                 org.cn/1000-9825/7403.htm
                 英文引用格式: Zhang  MT,  Yang  GL,  Bai  XY.  Knowledge  Graph  Accuracy  Evaluation  Using  Embedding  Model.  Ruan  Jian  Xue
                 Bao/Journal of Software, 2025, 36(12): 5674–5694 (in Chinese). http://www.jos.org.cn/1000-9825/7403.htm

                 Knowledge Graph Accuracy Evaluation Using Embedding Model
                               1,2
                                            1
                 ZHANG Ming-Tao , YANG Guo-Li , BAI Xiao-Ying 1
                 1
                 (Advanced Institute of Big Data, Beijing, Beijing 100195, China)
                 2
                 (School of Computer Science, Peking University, Beijing 100871, China)
                 Abstract:  Quality  issues,  such  as  errors  or  deficiencies  in  triplets,  become  increasingly  prominent  in  knowledge  graphs,  severely  affecting
                 the credibility of downstream applications. Accuracy evaluation is crucial for building confidence in the use and optimization of knowledge
                 graphs.  An  embedding-model-based  method  is  proposed  to  reduce  reliance  on  manually  labeled  data  and  to  achieve  scalable  automatic
                 evaluation. Triplet verification is formulated as an automated threshold selection problem, with three threshold selection strategies proposed
                 to  enhance  the  robustness  of  the  evaluation.  In  addition,  triplet  importance  indicators  are  incorporated  to  place  greater  emphasis  on  critical
                 triplets,  with  importance  scores  defined  based  on  network  structure  and  relationship  semantics.  Experiments  are  conducted  to  analyze  and
                 compare  the  impact  on  performance  from  various  perspectives,  such  as  embedding  model  capacity,  knowledge  graph  sparsity,  and  triplet
                 importance  definition.  The  results  demonstrate  that,  compared  to  existing  automated  evaluation  methods,  the  proposed  method  can
                 significantly reduce evaluation errors by nearly 30% in zero-shot conditions, particularly on datasets of dense graphs with high error rates.
                 Key words:  knowledge graph; accuracy evaluation; embedding model

                    知识图谱广泛应用于智能搜索、机器学习、自动问答、知识推理等领域, 例如                           IBM Waston  进行智能问答时
                            [1]
                                             [2]
                                                                  [3]
                 利用了   YAGO 知识图谱与      DBpedia 知识库, CMU  设计  NELL 系统收集网络信息进行知识整合与推理, Google


                 *    基金项目: 国家自然科学基金  (72201275)
                  收稿时间: 2024-07-13; 修改时间: 2024-09-26; 采用时间: 2025-02-10; jos 在线出版时间: 2025-06-18
                  CNKI 网络首发时间: 2025-06-19
   288   289   290   291   292   293   294   295   296   297   298