What is Microsoft’s GraphRAG, and why does it deserve your attention?

Divi Pothukuchi
4 min readJul 8, 2024

--

Microsoft’s GraphRAG

Introduction

What is RAG?

Retrieval-augmented generation (RAG) is the process of customizing the output of a large language model. It gives an output based on knowledge outside its training data. RAG extends the already powerful capabilities of LLMs to specific domains or an organization’s internal knowledge base. It is a cost-effective approach to improving LLM output to remain relevant and current.

The input to RAG can be given in a variety of formats. Naive RAG refers to the method by which the input documents are chunked and embedded in a vector database. Knowledge Graph-based methods for RAG have also been explored.

A knowledge graph collects interlinked entities, relations, and events, providing a more comprehensive data source.

Microsoft’s GraphRAG combines the two approaches- Naive RAG and Knowledge Graph-based RAG and uses the natural modularity of graphs to create a more sophisticated RAG system.

Why do we need a more sophisticated system?

There are a few significant problems with the existing RAG Approaches-

  1. Poor performance on Global queries- RAG performs well on explicit retrieval-based tasks. Still, it fails to do Query-Focused Summarization Tasks, i.e., answering queries that require reasoning over the entire corpus of text. For example, answering global queries like “What are the main themes covered in the text?” requires sensemaking abilities that are not present in naive RAG approaches. There are existing QFS methods, but they cannot process as much text as naive RAG can. GraphRAG aims to combine the strengths of both approaches to achieve State-of-the-art results in Query-Focused-Abstractive-Summarization Tasks to generate natural language summaries and not just concatenated excerpts.
  2. Naive RAG fails to connect the dots and performs poorly on multihop Question Answering(Questions that would require a certain amount of reasoning to be answered)-GraphRAG tries to solve this problem by employing techniques like Graph Machine Learning to cluster similar nodes hierarchically and establish links between seemingly unrelated data. This approach helps it to come up with more sophisticated answers.
Examples of clusters in a knowledge graph

While GraphRAG incorporates many features of existing RAG systems, it additionally uses the natural modularity of graphs to divide them into communities and generate community summaries at different levels.

Given a question, each community summary generates a partial response before all partial responses are again summarized in a final response to the user. This approach leads to substantial improvements over a naive RAG for the comprehensiveness and diversity of generated answers.

The outputs generated by GraphRAG have been shown to consistently beat the outputs generated by other RAG systems on a variety of metrics like-

  • Comprehensiveness. How much detail does the answer provide to cover all aspects and details of the Question?
  • Diversity. How varied and rich is the answer to giving different perspectives and insights?
  • Empowerment. How well does the answer help the reader understand and make informed judgments about the topic?
  • Directness. How specifically does the answer address the Question?

Conclusion

As simple retrieval tasks become increasingly manageable due to the widespread adoption of RAG techniques, the focus is shifting towards more sophisticated systems capable of providing comprehensive answers to global queries and reasoning over large bodies of text. Microsoft’s GraphRAG emerges as a timely solution to this growing need.

GraphRAG’s ability to generate global summaries and efficiently handle repeated queries positions it as a significant advancement over traditional RAG systems. As the demand for AI systems that can engage in deeper reasoning and provide more insightful answers continues to grow, GraphRAG represents a promising step forward in augmenting human understanding and decision-making processes.

Refer to this paper for a detailed comparison between GraphRAG and other RAG approaches: https://www.microsoft.com/en-us/research/publication/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization/

For the code implementation, refer to this Github Repo: https://github.com/microsoft/graphrag

To learn more about the behind-the-scenes working of GraphRAG, refer to this article: https://medium.com/@divi.pothukuchi/what-exactly-happens-in-microsofts-graphrag-that-makes-it-so-unique-13ca7d93acf0

--

--