Skip to content

Transform documents into a knowleThis project reimplements the ideas from From Local to Global: A Graph RAG Approach to Query-Focused Summarization using LangChain, with support for multiple LLM and embedding providers plus a minimal app showing how to wir

License

Notifications You must be signed in to change notification settings

avii2/LangChain_Graph_RAG

Repository files navigation

GraphRAG for LangChain

Transform documents into a knowledge graph that powers both entity-specific and thematic Retrieval Augmented Generation (RAG). This project reimplements the ideas from From Local to Global: A Graph RAG Approach to Query-Focused Summarization using LangChain, with support for multiple LLM and embedding providers plus a minimal app showing how to wire the pieces together.

What this project does

  • Build a knowledge graph from raw text: split documents into text units, extract entities/relationships with an LLM, merge and summarize them, detect communities with hierarchical Leiden clustering, and generate community reports.
  • Query with two complementary strategies:
    • Global search answers holistic questions by traversing community reports and aggregated key points.
    • Local search answers entity-focused questions by fanning out across nearby entities, relationships, and supporting text.
  • Optional ontology grounding: align extracted entities to an existing ontology via vector search and LLM reranking.
  • Baseline RAG workflow for comparison alongside the graph-based approach.
  • Streamlit UI and Typer CLI in examples/simple-app that demonstrate indexing, querying, visualization, and batch QA.

Why not just use the official repo?

Key differences from the reference implementation at microsoft/graphrag:

  • Built on LangChain primitives for extensibility.
  • Swappable providers: Azure OpenAI, OpenAI, Gemini, and local Ollama for LLMs/embeddings.
  • Modular, readable pipeline components that can be reused outside any specific workflow engine.

Repository layout

  • src/langchain_graphrag/ – library code for indexing, querying, graph/embedding generation, and ontology grounding.
  • examples/simple-app/ – Typer CLI + Streamlit UI showcasing end-to-end usage on sample data.
  • docs/ – conceptual and implementation guides (architecture, indexing pipeline, query system, data flows).
  • data/ and tmp/ – default locations for generated artifacts, caches, and vector stores.

Prerequisites

  • Python 3.10+
  • uv or pip for dependency installation.
  • Access credentials for at least one provider:
    • Azure OpenAI, OpenAI, Gemini (Google GenAI), or a running Ollama instance (OLLAMA_HOST, default http://localhost:11434).
  • (Optional) LangSmith token if you want tracing.

Installation

Clone the repo and install dependencies (dev + runtime):

uv sync

If you prefer pip:

pip install -e .

The package is also published to PyPI:

pip install langchain-graphrag

Configure credentials

Copy the example env file and fill in the keys you need:

cp examples/simple-app/.env.example examples/simple-app/app/.env

Relevant variables:

  • LANGCHAIN_GRAPHRAG_AZURE_OPENAI_CHAT_API_KEY, LANGCHAIN_GRAPHRAG_AZURE_OPENAI_CHAT_ENDPOINT, LANGCHAIN_GRAPHRAG_AZURE_OPENAI_CHAT_DEPLOYMENT
  • LANGCHAIN_GRAPHRAG_AZURE_OPENAI_EMBED_API_KEY, LANGCHAIN_GRAPHRAG_AZURE_OPENAI_EMBED_ENDPOINT, LANGCHAIN_GRAPHRAG_AZURE_OPENAI_EMBED_DEPLOYMENT
  • LANGCHAIN_GRAPHRAG_OPENAI_CHAT_API_KEY, LANGCHAIN_GRAPHRAG_OPENAI_EMBED_API_KEY
  • LANGCHAIN_API_KEY (Gemini via LangChain), OLLAMA_HOST

Quickstart (sample data)

All commands run from the repo root and assume uv sync is complete.

  1. Index sample text into a graph
# OpenAI (swap to azure/ollama/gemini variants if desired)
uv run poe simple-app-indexer-openai

Artifacts land under tmp/artifacts-<llm-model> with vector stores in tmp/vector_stores.

  1. Ask a global question (thematic)
uv run poe simple-app-global-search-openai --query "What are the main themes in this story?"
  1. Ask a local question (entity-focused)
uv run poe simple-app-local-search-openai --query "Who is Scrooge, and what are his main relationships?"

Provider variants are available:

  • Azure: simple-app-indexer-azure, simple-app-global-search-azure, simple-app-local-search-azure
  • Ollama: simple-app-indexer-ollama, simple-app-global-search-ollama, simple-app-local-search-ollama
  • Gemini: simple-app-indexer-gemini, simple-app-global-search-gemini, simple-app-local-search-gemini
  1. Visualize the graph
python examples/simple-app/app/main.py indexer visualize \
  --artifacts-dir tmp/artifacts-<your-llm-model> \
  --output tmp/graph.gexf

You can also emit .graphml, .gml, or .png (set --graph merged to view the raw merged graph).

  1. Compare to a baseline RAG run
uv run python examples/simple-app/app/main.py baseline index \
  --input-file examples/input-data/book.txt \
  --output-dir tmp/out --cache-dir tmp/cache \
  --embedding-type gemini --embedding-model gemini-embedding-001

uv run python examples/simple-app/app/main.py baseline query \
  --output-dir tmp/out --cache-dir tmp/cache \
  --llm-type gemini --llm-model gemini-2.5-flash \
  --embedding-type gemini --embedding-model gemini-embedding-001 \
  --query "What are the main themes in this story?" --top-k 4
  1. Try the Streamlit UI
uv run streamlit run examples/simple-app/app/ui.py

Ontology grounding (optional)

Build or reuse an ontology vector store, then link entities during indexing:

python examples/simple-app/app/ontologyGrounding/build_ontology_index.py \
  --url http://purl.obolibrary.org/obo/go/go-basic.obo \
  --output-dir data --cache-dir tmp/cache \
  --embedding-type gemini --embedding-model gemini-embedding-001 \
  --batch-size 200 --collection-name ontology-gemini-embedding-001

python examples/simple-app/app/main.py indexer index \
  --input-file examples/input-data/book.txt \
  --output-dir data --cache-dir tmp/cache \
  --llm-type gemini --llm-model gemini-2.5-flash \
  --embedding-type gemini --embedding-model gemini-embedding-001 \
  --ontology-enable --ontology-top-k 10 --ontology-threshold 0.60

Development

  • Run tests: uv run poe test
  • Lint/format: uv run poe check or uv run poe fix
  • Serve docs locally: uv run poe docs-serve

Learn more

  • Conceptual overview: docs/architecture/overview.md
  • Indexing pipeline details: docs/guides/indexing_pipeline.md
  • Query strategies: docs/guides/query_system.md
  • Advanced extraction notebooks: docs/guides/graph_extraction/

About

Transform documents into a knowleThis project reimplements the ideas from From Local to Global: A Graph RAG Approach to Query-Focused Summarization using LangChain, with support for multiple LLM and embedding providers plus a minimal app showing how to wir

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages