石油科技论坛 ›› 2024, Vol. 43 ›› Issue (6): 114-125.DOI: 10.3969/j.issn.1002-302X.2024.06.014

• 技术前沿 • 上一篇    

知识图谱——提升油气行业大模型RAG 性能的关键技术

宋子瑜   

  1. 日本广岛大学工学部
  • 出版日期:2024-12-31 发布日期:2025-01-24
  • 作者简介:宋子瑜,2000 年生,日本广岛大学工学部硕士研究生在读,主要从事自然语言处理方向研究。

Knowledge Graphs: Key Technology for Enhancing RAG Performance in the Oil and Gas Industry

Song Ziyu   

  1. Chome-3-2 Kagamiyama, Higashihiroshima, Hiroshima 739-0046, Japan
  • Online:2024-12-31 Published:2025-01-24

摘要: 针对传统RAG在关联分析、信息整合与逻辑推理能力等方面存在的局限性,以知识图谱与RAG为研究对象,分析国内外研究进展与应用案例。ChatLaw引入领域专家精准定义法律实体、关系与案例,通过高质量知识图谱提升法律咨询的准确性。GraphRAG采用知识图谱表示非结构文本中实体与关系,通过层次聚类、摘要生成等技术提升RAG在大规模数据集上全局搜索能力。HippoRAG在查询阶段利用知识图谱进行概念扩展和检索,提升RAG知识整合与多跳推理能力。归纳RAG与知识图谱融合方法,在数据分块、数据存储、查询优化、检索召回、重排、提示词构建、答案生成等阶段引入知识图谱,可以提升RAG准确率、关联分析能力、推理能力与可解释性。基于Lucene、LangChain等开源框架设计全文检索、向量检索、图谱检索3 套方案,将其应用于油气知识问答场景,验证知识图谱对增强RAG 的有效性。

关键词: 大语言模型, 检索增强生成, 知识图谱, 向量数据库, 图数据库

Abstract: Addressing the limitations of traditional Rule-Augmented Generation (RAG) in aspects such as associative analysis, information integration, and logical reasoning capabilities, this study focuses on the integration methods of knowledge graphs and RAG by analyzing the research progress and application cases at home and abroad. ChatLaw introduces domain experts to precisely define legal entities, relationships, and cases, enhancing the accuracy of legal consultations through high-quality knowledge graphs. GraphRAG employs knowledge graphs to represent entities and relationships in unstructured text, leveraging techniques such as hierarchical clustering and summarization generation to improve RAG’s global search capabilities on large-scale datasets. HippoRAG utilizes knowledge graphs for concept expansion and retrieval during the query step, enhancing RAG’s knowledge integration and multi-hop reasoning abilities. This study summarizes fusion methods of RAG and knowledge graphs, suggesting that incorporating knowledge graphs in stages such as data partitioning, data storage, query optimization, retrieval recall, reranking, prompt construction,and answer generation can improve RAG’s accuracy, associative analysis capabilities, reasoning abilities, and interpretability. Based on open-source frameworks such as Lucene and LangChain, three retrieval schemes—full-text retrieval, vector retrieval, and graph retrieval—are designed and applied to oil and gas knowledge Q&A scenarios, validating the effectiveness of knowledge graphs in enhancing RAG.

Key words: large language models, retrieval-augmented generation, knowledge graph, vector database, graph database

中图分类号: