From "find similar text" to "reason about relationships." The single biggest intelligence upgrade you can make.
See also: LightRAG is the knowledge layer. Combine with Part 17: MCP Servers (memory-MCP + mem0 for cross-device memory), Part 18: Coding Agents (let Gemini's 1M context ingest the whole LightRAG dump for synthesis), and Part 20: Observability (trace embedding calls).
Hermes ships with vector-based memory search. It finds documents that are textually similar to your query. That works for simple lookups, but it has a fundamental ceiling: it finds what's similar, not what's connected.
Ask "what hardware decisions were made and why?" and vector search returns files that all mention GPUs. It can't traverse from a decision → the person who made it → the project it affected → the lesson learned afterward.
Graph RAG fixes this. It builds a knowledge graph (entities + relationships) alongside your vector database, then searches both simultaneously.
| Naive RAG (Default) | Graph RAG (LightRAG) | |
|---|---|---|
| Indexes | Text chunks as vectors | Entities, relationships, AND text chunks |
| Retrieves | Similar text (cosine similarity) | Connected knowledge (graph traversal + similarity) |
| Answers | "Here's what the docs say about X" | "Here's how X relates to Y, who decided Z, and why" |
| Scales | Degrades at 500+ docs (too many partial matches) | Improves with more docs (richer graph) |
| Cost | Cheap (embedding only) | More expensive upfront (LLM extracts entities) but cheaper at query time |
LightRAG is an open-source graph RAG framework from HKU (EMNLP 2025 paper). It competes with Microsoft's GraphRAG at a fraction of the cost.
Why LightRAG over alternatives:
| Tool | Graph | Vector | Web UI | Self-Hosted | API | Cost |
|---|---|---|---|---|---|---|
| LightRAG | Yes | Yes | Yes | Yes | REST API | Free |
| Microsoft GraphRAG | Yes | Yes | No | Yes | No | 10-50x more |
| Graphiti + Neo4j | Yes | No (separate) | No (Neo4j browser) | Yes | Build your own | Free but manual |
| Plain vector search | No | Yes | No | Yes | Yes | Free |
LightRAG does vector DB + knowledge graph in parallel during ingestion. One system, both capabilities.
- Python 3.11+
- An LLM API key (for entity extraction during ingestion — OpenAI, Anthropic, or any OpenAI-compatible provider)
- An embedding API key (Fireworks recommended for high-quality 4096-dim embeddings, or use local Ollama)
# Create a dedicated directory
mkdir -p ~/.hermes/lightrag
cd ~/.hermes/lightrag
# Clone LightRAG
git clone https://github.com/HKUDS/LightRAG.git
cd LightRAG
# Install dependencies
pip install -e ".[api]"Create ~/.hermes/lightrag/.env:
# LLM for entity extraction (during ingestion)
LLM_BINDING=openai
LLM_MODEL=gpt-4.1-mini
LLM_BINDING_API_KEY=<your-openai-api-key>
# Embedding model (for vector storage)
EMBEDDING_BINDING=fireworks
EMBEDDING_MODEL=accounts/fireworks/models/qwen3-embedding-8b
EMBEDDING_API_KEY=<your-fireworks-api-key>
# Or use local Ollama (free, no API key needed):
# EMBEDDING_BINDING=ollama
# EMBEDDING_MODEL=nomic-embed-textSecurity tip: Set restrictive permissions on this file:
chmod 600 ~/.hermes/lightrag/.env
Tip: Use
gpt-4.1-miniorclaude-sonnet-4-20250514for entity extraction. It doesn't need to be your smartest model — it just needs to reliably identify entities and relationships. Cheaper models save money on ingestion.
Embedding quality matters. If you have a GPU with 8GB+ VRAM, run
nomic-embed-textlocally via Ollama for free. If you want the best quality, use Fireworks' Qwen3-Embedding-8B (4096 dimensions) — the search accuracy difference is dramatic.
cd ~/.hermes/lightrag/LightRAG
# Start the API server (binds to localhost by default)
lightrag-server --host 127.0.0.1 --port 9623The server starts on http://localhost:9623 with:
- REST API for ingestion and querying
- Web UI at
http://localhost:9623/webuifor browsing the knowledge graph - Health check at
http://localhost:9623/health
Security warning: The LightRAG REST API has no built-in authentication. Always bind to
127.0.0.1(localhost only) — never0.0.0.0. If you need remote access, put it behind a reverse proxy (nginx, Caddy) with authentication, or use SSH tunneling. Anyone who can reach this port can query, ingest, or delete your knowledge graph data.
# Using nohup
nohup lightrag-server --port 9623 > ~/.hermes/lightrag/server.log 2>&1 &
# Or use hermes to manage it
hermes background "cd ~/.hermes/lightrag/LightRAG && lightrag-server --port 9623"Document (markdown, text, PDF, etc.)
↓
Chunking (text split into segments)
↓
┌─────────────────┐ ┌──────────────────┐
│ Embedding Model │ │ LLM Entity │
│ (vector storage)│ │ Extraction │
└────────┬────────┘ └────────┬─────────┘
↓ ↓
Vector Database Knowledge Graph
(similarity search) (entity relationships)
For each document, LightRAG:
- Chunks the text and embeds it (standard vector RAG)
- Uses an LLM to extract entities (people, tools, projects, concepts) and relationships (who decided what, what depends on what)
- Stores both in parallel — vectors for similarity, graph for structure
# Ingest a single file
curl -X POST http://localhost:9623/documents/upload \
-F "file=@/path/to/your/document.md"
# Ingest a text string directly
curl -X POST http://localhost:9623/documents/text \
-H "Content-Type: application/json" \
-d '{"text": "Your knowledge content here...", "description": "Source description"}'
# Ingest all files in a directory
for file in ~/.hermes/memories/*.md; do
curl -X POST http://localhost:9623/documents/upload -F "file=@$file"
echo "Ingested: $file"
doneFeed LightRAG everything your agent needs to "know":
- Memory files —
~/.hermes/memories/*.md - Project docs — README files, design docs, decision logs
- Chat summaries — Exported conversation summaries
- Notes — Any markdown/text knowledge you want searchable
- Code comments — Extracted from important codebases
Start with your memory files and project docs. These give the graph the most value — decisions, people, projects, and their relationships.
LightRAG has four query modes:
| Mode | Best For | How It Works |
|---|---|---|
naive |
Simple keyword lookups | Vector search only (like basic RAG) |
local |
Specific entity facts | Entity-focused graph traversal |
global |
Cross-document relationships | Relationship-focused traversal |
hybrid |
General questions (default) | Both local + global combined |
# Hybrid query (recommended default)
curl -X POST http://localhost:9623/query \
-H "Content-Type: application/json" \
-d '{
"query": "What infrastructure decisions were made and why?",
"mode": "hybrid",
"only_need_context": false
}'
# Local mode — specific entity facts
curl -X POST http://localhost:9623/query \
-H "Content-Type: application/json" \
-d '{
"query": "Tell me about the 5090 PC setup",
"mode": "local"
}'
# Global mode — relationship discovery
curl -X POST http://localhost:9623/query \
-H "Content-Type: application/json" \
-d '{
"query": "How do the different projects relate to each other?",
"mode": "global"
}'curl -X POST http://localhost:9623/query \
-H "Content-Type: application/json" \
-d '{
"query": "What models are running on what hardware?",
"mode": "hybrid",
"only_need_context": true
}'This returns the raw context chunks without generating an answer — useful for feeding into your own pipeline or Hermes' LLM.
Create ~/.hermes/skills/research/lightrag/SKILL.md:
---
name: lightrag
description: Query the LightRAG knowledge graph for past decisions, infrastructure, projects, and lessons learned. Use before saying "I don't remember."
---
# LightRAG Knowledge Graph
Query the LightRAG knowledge graph for past decisions, infrastructure, projects, and lessons learned.
## When To Use
- User asks about past work, decisions, or "what happened with X"
- Need context on projects, hardware, or configurations
- Remembering lessons learned or past issues
- Any question where you'd say "I don't remember" — use this FIRST
## Usage
```bash
curl -s -X POST http://localhost:9623/query \
-H "Content-Type: application/json" \
-d '{"query": "YOUR QUERY", "mode": "hybrid", "only_need_context": true}'hybrid(default): Combined vector + graph searchlocal: Entity-focused (specific facts)global: Relationship-focused (how things connect)naive: Vector-only (simple lookups)
- ALWAYS search this before saying "I don't remember"
- Results supersede general knowledge about the setup
- Reference entity names when citing results
### Query from a Script
Create `~/.hermes/skills/research/lightrag/scripts/lightrag_search.py`:
```python
#!/usr/bin/env python3
"""LightRAG search script for Hermes skill integration."""
import json
import sys
import urllib.request
def search(query: str, mode: str = "hybrid") -> str:
url = "http://localhost:9623/query"
payload = json.dumps({
"query": query,
"mode": mode,
"only_need_context": True
}).encode()
req = urllib.request.Request(url, data=payload, headers={"Content-Type": "application/json"})
try:
with urllib.request.urlopen(req, timeout=30) as resp:
result = json.loads(resp.read())
return result.get("response", result.get("data", str(result)))
except Exception as e:
return f"LightRAG query failed: {e}"
if __name__ == "__main__":
query = " ".join(sys.argv[1:]) if len(sys.argv) > 1 else ""
if not query:
print("Usage: lightrag_search.py <query>")
sys.exit(1)
print(search(query))
The quality of your graph depends on entity extraction. In LightRAG's config:
# More entities = richer graph, slower ingestion
entity_extract_max_gleaning: 5 # Default: 3. Higher = more thorough
# Chunk size affects entity density
chunk_token_size: 1200 # Default: 1200. Smaller = more entities per doc
chunk_overlap_token_size: 100 # Default: 100Embedding quality directly impacts vector search accuracy:
| Model | Dimensions | Quality | Cost |
|---|---|---|---|
| nomic-embed-text (Ollama) | 768 | Good | Free (local) |
| Qwen3-Embedding-8B (Fireworks) | 4096 | Excellent | ~$0.001/1K tokens |
| text-embedding-3-large (OpenAI) | 3072 | Very Good | ~$0.00013/1K tokens |
If search quality matters, use 4096-dimension embeddings. The difference between 768 and 4096 dims is like the difference between 720p and 4K — you catch details you'd otherwise miss.
After ingesting a large batch of new documents:
# Check entity count
curl http://localhost:9623/graph/label/list | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'{len(d)} entities')"Don't always default to hybrid. Use:
localwhen asking about a specific thing ("Tell me about the GPU setup")globalwhen asking about connections ("How do the projects relate?")hybridfor general questions ("What decisions were made last week?")
The Web UI at http://localhost:9623/webui lets you:
- Browse the knowledge graph visually
- See entity relationships
- Identify orphaned or redundant entities
Once the server is running, open http://localhost:9623/webui in your browser. You can:
- Search the graph with any query mode
- Visualize entity relationships as a network graph
- Browse all entities and their connections
- Inspect raw chunks and their source documents
The server isn't running. Start it:
cd ~/.hermes/lightrag/LightRAG && lightrag-server --port 9623Entity extraction is LLM-bound. Speed it up:
- Use a faster model for ingestion (GPT-4.1-mini, Claude Haiku)
- Process documents in parallel batches
- Use a local model if you have GPU capacity
- Check that documents were actually ingested (Web UI → entities)
- Try different query modes (
localvsglobalvshybrid) - Rephrase your query — be more specific about entities
- Check embedding model is actually running (
curl http://localhost:11434/api/tagsfor Ollama)
LightRAG merges similar entities automatically, but exact duplicates can happen. Use the Web UI to manually clean up, or reindex from scratch:
# Nuclear option: wipe and reingest
rm -rf ~/.hermes/lightrag/LightRAG/rag_storage/*
# Then re-ingest your documents- Need mobile access? → Part 4: Telegram Setup
- Want the agent to self-improve? → Part 5: On-the-Fly Skills