GraphRAG for LangChain

Transform documents into a knowledge graph that powers both entity-specific and thematic Retrieval Augmented Generation (RAG). This project reimplements the ideas from From Local to Global: A Graph RAG Approach to Query-Focused Summarization using LangChain, with support for multiple LLM and embedding providers plus a minimal app showing how to wire the pieces together.

What this project does

Build a knowledge graph from raw text: split documents into text units, extract entities/relationships with an LLM, merge and summarize them, detect communities with hierarchical Leiden clustering, and generate community reports.
Query with two complementary strategies:
- Global search answers holistic questions by traversing community reports and aggregated key points.
- Local search answers entity-focused questions by fanning out across nearby entities, relationships, and supporting text.
Optional ontology grounding: align extracted entities to an existing ontology via vector search and LLM reranking.
Baseline RAG workflow for comparison alongside the graph-based approach.
Streamlit UI and Typer CLI in examples/simple-app that demonstrate indexing, querying, visualization, and batch QA.

Why not just use the official repo?

Key differences from the reference implementation at microsoft/graphrag:

Built on LangChain primitives for extensibility.
Swappable providers: Azure OpenAI, OpenAI, Gemini, and local Ollama for LLMs/embeddings.
Modular, readable pipeline components that can be reused outside any specific workflow engine.

Repository layout

src/langchain_graphrag/ – library code for indexing, querying, graph/embedding generation, and ontology grounding.
examples/simple-app/ – Typer CLI + Streamlit UI showcasing end-to-end usage on sample data.
docs/ – conceptual and implementation guides (architecture, indexing pipeline, query system, data flows).
data/ and tmp/ – default locations for generated artifacts, caches, and vector stores.

Prerequisites

Python 3.10+
uv or pip for dependency installation.
Access credentials for at least one provider:
- Azure OpenAI, OpenAI, Gemini (Google GenAI), or a running Ollama instance (OLLAMA_HOST, default http://localhost:11434).
(Optional) LangSmith token if you want tracing.

Installation

Clone the repo and install dependencies (dev + runtime):

uv sync

If you prefer pip:

pip install -e .

The package is also published to PyPI:

pip install langchain-graphrag

Configure credentials

Copy the example env file and fill in the keys you need:

cp examples/simple-app/.env.example examples/simple-app/app/.env

Relevant variables:

LANGCHAIN_GRAPHRAG_AZURE_OPENAI_CHAT_API_KEY, LANGCHAIN_GRAPHRAG_AZURE_OPENAI_CHAT_ENDPOINT, LANGCHAIN_GRAPHRAG_AZURE_OPENAI_CHAT_DEPLOYMENT
LANGCHAIN_GRAPHRAG_AZURE_OPENAI_EMBED_API_KEY, LANGCHAIN_GRAPHRAG_AZURE_OPENAI_EMBED_ENDPOINT, LANGCHAIN_GRAPHRAG_AZURE_OPENAI_EMBED_DEPLOYMENT
LANGCHAIN_GRAPHRAG_OPENAI_CHAT_API_KEY, LANGCHAIN_GRAPHRAG_OPENAI_EMBED_API_KEY
LANGCHAIN_API_KEY (Gemini via LangChain), OLLAMA_HOST

Quickstart (sample data)

All commands run from the repo root and assume uv sync is complete.

Index sample text into a graph

# OpenAI (swap to azure/ollama/gemini variants if desired)
uv run poe simple-app-indexer-openai

Artifacts land under tmp/artifacts-<llm-model> with vector stores in tmp/vector_stores.

Ask a global question (thematic)

uv run poe simple-app-global-search-openai --query "What are the main themes in this story?"

Ask a local question (entity-focused)

uv run poe simple-app-local-search-openai --query "Who is Scrooge, and what are his main relationships?"

Provider variants are available:

Azure: simple-app-indexer-azure, simple-app-global-search-azure, simple-app-local-search-azure
Ollama: simple-app-indexer-ollama, simple-app-global-search-ollama, simple-app-local-search-ollama
Gemini: simple-app-indexer-gemini, simple-app-global-search-gemini, simple-app-local-search-gemini

Visualize the graph

python examples/simple-app/app/main.py indexer visualize \
  --artifacts-dir tmp/artifacts-<your-llm-model> \
  --output tmp/graph.gexf

You can also emit .graphml, .gml, or .png (set --graph merged to view the raw merged graph).

Compare to a baseline RAG run

uv run python examples/simple-app/app/main.py baseline index \
  --input-file examples/input-data/book.txt \
  --output-dir tmp/out --cache-dir tmp/cache \
  --embedding-type gemini --embedding-model gemini-embedding-001

uv run python examples/simple-app/app/main.py baseline query \
  --output-dir tmp/out --cache-dir tmp/cache \
  --llm-type gemini --llm-model gemini-2.5-flash \
  --embedding-type gemini --embedding-model gemini-embedding-001 \
  --query "What are the main themes in this story?" --top-k 4

Try the Streamlit UI

uv run streamlit run examples/simple-app/app/ui.py

Ontology grounding (optional)

Build or reuse an ontology vector store, then link entities during indexing:

python examples/simple-app/app/ontologyGrounding/build_ontology_index.py \
  --url http://purl.obolibrary.org/obo/go/go-basic.obo \
  --output-dir data --cache-dir tmp/cache \
  --embedding-type gemini --embedding-model gemini-embedding-001 \
  --batch-size 200 --collection-name ontology-gemini-embedding-001

python examples/simple-app/app/main.py indexer index \
  --input-file examples/input-data/book.txt \
  --output-dir data --cache-dir tmp/cache \
  --llm-type gemini --llm-model gemini-2.5-flash \
  --embedding-type gemini --embedding-model gemini-embedding-001 \
  --ontology-enable --ontology-top-k 10 --ontology-threshold 0.60

Development

Run tests: uv run poe test
Lint/format: uv run poe check or uv run poe fix
Serve docs locally: uv run poe docs-serve

Learn more

Conceptual overview: docs/architecture/overview.md
Indexing pipeline details: docs/guides/indexing_pipeline.md
Query strategies: docs/guides/query_system.md
Advanced extraction notebooks: docs/guides/graph_extraction/

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.devcontainer		.devcontainer
.vscode		.vscode
docs		docs
examples		examples
scripts		scripts
src/langchain_graphrag		src/langchain_graphrag
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
commands.txt		commands.txt
mkdocs.yml		mkdocs.yml
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
requirements-docs.txt		requirements-docs.txt
ruff.toml		ruff.toml
tmp_indexer_numbered.txt		tmp_indexer_numbered.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphRAG for LangChain

What this project does

Why not just use the official repo?

Repository layout

Prerequisites

Installation

Configure credentials

Quickstart (sample data)

Ontology grounding (optional)

Development

Learn more

About

Uh oh!

Releases

Packages

Languages

License

avii2/LangChain_Graph_RAG

Folders and files

Latest commit

History

Repository files navigation

GraphRAG for LangChain

What this project does

Why not just use the official repo?

Repository layout

Prerequisites

Installation

Configure credentials

Quickstart (sample data)

Ontology grounding (optional)

Development

Learn more

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages