Page 214 - 《软件学报》2025年第12期

P. 214

吴信东等: 华谱通: 基于知识推理的家谱问答大语言模型 5595

其次, 构建一种 LLM 和知识图谱双向增强的问答系统. 目前的华谱通还处于知识图谱增强的 LLM 问答研究
领域. 在知识生成方面, 华谱通目前还无法根据用户实时提供的信息抽取新的家谱知识. 后期拟研究基于 LLM 的
知识图谱管理框架, 利用 LLM 自身的预训练参数化知识, 从知识图谱构建、知识编辑和知识验证等方面打通知识
图谱与 LLM 的数据交互, 以构建更灵活的人机交互问答系统.
此外, 在跨家谱问答方面, 华谱通目前仅做了部分前瞻性的技术调研与实验分析, 以此验证跨家谱问答技术路
线的可行性与有效性. 在后续的工作中, 拟参考亲属关系的 Jena 推理规则定义方法, 重点针对基础社会关系 (如同
事、师生、朋友等) 设计一个跨谱关系推理机制, 并结合人物属性和亲属信息的粗粒度匹配结果, 引导大模型动态
生成更多样的社会关系, 以保证华谱通在复杂人物关联问题上的求解灵活性.
最后, 在上述 3 个研究路线的基础上, 探索华谱通的泛化能力. 目前华谱通仅在家谱问答方面验证了完备性推
理逻辑对 LLM 在垂直领域知识库上问答的有效性. 在未来, 拟考虑将华谱通的知识图谱推理框架应用到不同领域
的问答场景中, 如医疗、金融和生物制药等.

References:
[1] Wu XD, Li J, Zhou P, Bu CY. Fusion technique for fragmented genealogy data. Ruan Jian Xue Bao/Journal of Software, 2021, 32(9):
2816–2836 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6010.htm [doi: 10.13328/j.cnki.jos.006010]
[2] Wu XD, Jiang TT, Zhu Y, Bu CY. Knowledge graph for China’s genealogy. IEEE Trans. on Knowledge and Data Engineering, 2023,
35(1): 634–646. [doi: 10.1109/TKDE.2021.3073745]
[3] Wu XD, Sheng SJ, Jiang TT, Bu CY, Wu MH. Huapu-CP: From knowledge graphs to a data central-platform. Acta Automatica Sinica,
2020, 46(10): 2045–2059 (in Chinese with English abstract). [doi: 10.16383/j.aas.c200502]
[4] Wu XD, Chen HH, Wu GQ, Liu J, Zheng QH, He XF, Zhou AY, Zhao ZQ, Wei BF, Li Y, Zhang QP, Zhang SC. Knowledge engineering
with big data. IEEE Intelligent Systems, 2015, 30(5): 46–55. [doi: 10.1109/MIS.2015.56]
[5] Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. In: Proc. of the 34th Int’l Conf. on Neural Information
Processing Systems. Vancouver: Curran Associates Inc., 2020. 1877–1901.
[6] Yin CL, Du KP, Nong Q, Zhang HC, Yang L, Yan B, Huang X, Wang XB, Zhang X. PowerPulse: Power energy chat model with
LLaMA model fine-tuned on Chinese and power sector domain knowledge. Expert Systems, 2024, 41(3): e13513. [doi: 10.1111/exsy.
13513]
[7] Teubner T, Flath CM, Weinhardt C, van der Aalst W, Hinz O. Welcome to the era of ChatGPT et al. Business & Information Systems
Engineering, 2023, 65(2): 95–101. [doi: 10.1007/s12599-023-00795-x]
[8] Tay Y, Dehghani M, Bahri D, Metzler D. Efficient transformers: A survey. ACM Computing Surveys, 2023, 55(6): 109. [doi: 10.1145/
3530811]
[9] Ji ZW, Lee N, Frieske R, Yu TZ, Su D, Xu Y, Ishii E, Bang YJ, Madotto A, Fung P. Survey of hallucination in natural language
generation. ACM Computing Surveys, 2023, 55(12): 248. [doi: 10.1145/3571730]
[10] Pan SR, Luo LH, Wang YF, Chen C, Wang JP, Wu XD. Unifying large language models and knowledge graphs: A roadmap. IEEE Trans.
on Knowledge and Data Engineering, 2024, 36(7): 3580–3599. [doi: 10.1109/TKDE.2024.3352100]
[11] Chen JW, Lin HY, Han XP, Sun L. Benchmarking large language models in retrieval-augmented generation. In: Proc. of the 38th AAAI
Conf. on Artificial Intelligence. Vancouver: AAAI Press, 2024. 17754–17762. [doi: 10.1609/aaai.v38i16.29728]
[12] Wu XD, Zhu XQ, Wu GQ, Ding W. Data mining with big data. IEEE Trans. on Knowledge and Data Engineering, 2014, 26(1): 97–107.
[doi: 10.1109/TKDE.2013.109]
[13] Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, Küttler H, Lewis M, Yih WT, Rocktäschel T, Riedel S, Kiela D. Retrieval-
augmented generation for knowledge-intensive NLP tasks. In: Proc. of the 34th Int’l Conf. on Neural Information Processing Systems.
Vancouver: Curran Associates Inc., 2020. 9459–9474.
[14] Futia G, Vetro A, Melandri A, De Martin JC. Training neural language models with SPARQL queries for semi-automatic semantic
mapping. Procedia Computer Science, 2018, 137: 187–198. [doi: 10.1016/j.procs.2018.09.018]
[15] Sun JS, Xu CJ, Tang LMY, Wang SZ, Lin C, Gong YY, Ni LM, Shum HY, Guo J. Think-on-graph: Deep and responsible reasoning of
large language model on knowledge graph. In: Proc. of the 12th Int’l Conf. on Learning Representations. Vienna: OpenReview.net, 2024.
1–31.
[16] Ma YB, Cao YX, Hong Y, Sun AX. Large language model is not a good few-shot information extractor, but a good reranker for hard
samples. In: Proc. of the 2023 Findings of the Association for Computational Linguistics: EMNLP. Singapore: Association for

209 210 211 212 213 214 215 216 217 218 219