Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation

Table of Contents

Contributions
#

本质上是构建了一个以Report Chunk Meta Graph为基本单位的Graph，通过层级链接来为每一个Meta Graph生成通用语言的描述，并添加tag。

层级链接和顶层实体间链接都是通过LLM计算得到。

在检索时，基于问题的Query来检索相应的Meta-graph，并进一步检索相应的Entity。

QA Dataset	Source		Distribution
PubMedQA	PubMed abstracts	yes, no, or maybe	PQA-L, 1,000 manually labeled pairs, used for testing; PQA-U, consisting of 61.2k unlabeled pairs which are not used; PQA-A, featuring 211.3k artificially generated pairs.
MedMCQA	Indian medical school entrance tests	4-choice	a training set with 182,822 questions a testing set containing 4,183 questions
USMLE	United States Medical Licensing Exams	4-choice	multilingual only the English portion is considered, which includes 10,178 + 1,273 + 1,273 pieces of data.