A production-grade system that ingests everything you read and write, builds a knowledge graph over it, and lets you query your own mind with hybrid RAG retrieval.
graph TD
A[Chrome Extension] -->|Capture Context| B[FastAPI /ingest]
B -->|Async Task| C[Celery Worker]
C --> D[(Redis - Dedup)]
C --> P[(PostgreSQL - Meta)]
C --> E[Chunk & Embed]
E --> F[(Qdrant - Vectors)]
C --> G[spaCy NER]
G --> H[(Neo4j - Graph)]
graph TD
I[Extension UI] -->|Query| J[FastAPI /query]
J --> K[LangGraph Agent]
K -->|Tool: Web| L[DuckDuckGo Search]
K -->|Tool: Personal DB| M[Hybrid Search]
M --> N[Vector Search]
M --> O[Graph Search]
N --> Q[RRF & Cross-Encoder]
O --> Q
Q --> R[Parent Assembly]
R --> K
K --> S[Gemini Synthesis]
S --> T[Answer + Citations]
What it captures:
- Page content as you browse (reader-mode cleaned HTML)
- Selected text and highlights
- Tab metadata (URL, title, time spent)
- YouTube transcripts via transcript API
- PDFs opened in the browser
How it works:
A content script injects into every page. On tab close or user trigger, it extracts cleaned text via Readability.js, then POSTs to your local FastAPI server on port 8000.
chrome.tabs API + content scripts + background service worker
Smart triggers — don't capture everything:
- Time-on-page threshold: 30 seconds minimum
- Scroll depth: 40% minimum
- Or explicit "save this" hotkey
Noise is the enemy. Capturing everything makes retrieval worthless.
Block the following(harder than capturing):
- Social media feeds
- Email inboxes
- Shopping pages
- Pages under 200 words
- Pages visited under 15 seconds
- Duplicate URLs within 7 days
All steps run async in the background via Celery. The ingestion endpoint returns immediately with a job ID.
Raw HTML → Readability.js (browser) or trafilatura (server) → clean text.
Strip boilerplate, navigation, ads. Extract structured metadata:
- Title
- Author
- Publish date
- Domain
- Word count
Use parent-child chunking:
- Large parent chunks: 1024 tokens — for context
- Small child chunks: 256 tokens — for precise retrieval
- Store both, link child → parent
For long documents, use semantic chunking — spliting on topic boundaries, not character count.
Embed child chunks.
Store vectors in Qdrant with payload:
Use Claude or spaCy to extract named entities and relationships from each document. Write them as nodes and edges to Neo4j.
TMaking it a "knowledge OS" vs a plain vector store.
Use Hashing on a content fingerprint before storing. If similarity > 0.92 with an existing document, skip or merge.
Prevents the same article from N different sources polluting retrieval.
Steps 2–5 run as Celery tasks. This keeps the Chrome extension feeling snappy — ingestion returns instantly, heavy work happens in background workers.
Celery + Redis broker → worker pool → status streamed via SSE
| Tier | Store | Read speed | What lives here |
|---|---|---|---|
| Tier 1 — Ephemeral | Python dict (in-process) | ~0ms | Current session context, conversation history with query agent. Cleared on restart. |
| Tier 2 — Session | Redis | ~1ms | Recent ingestion queue, dedup fingerprints, job status, user preferences, search cache (TTL 24h). |
| Tier 3 — Persistent | Qdrant + Neo4j + Postgres | ~5–50ms | Vectors, knowledge graph, document metadata, full text, eval logs. |
Classify the query type and route to the appropriate retrieval strategy:
| Query type | Example | Strategy |
|---|---|---|
| Factual lookup | "What is HyDE?" | BM25 + vector |
| Exploratory | "What do I know about RAG?" | HyDE + vector |
| Entity-centric | "What did I read about Sam Altman?" | Graph traversal first |
| Temporal | "What did I read last week about X?" | Postgres filter + vector |
Generate 3 hypothetical answers to the query, embed them, and use as additional search vectors.
Dramatically improves recall for vague or exploratory queries.
LlamaIndex: HypotheticalDocumentEmbedder
Run all three simultaneously
Merge results using Reciprocal Rank Fusion (RRF)
Cross-encoder re-rank top-20 candidates down to top-5.
This single step cuts hallucination rate significantly.
Query execution is orchestrated by a stateful LangGraph AI agent.
The agent acts as a router, deciding autonomously whether to use the search_personal_knowledge tool (which triggers steps 1-4) or a search_live_web tool (using DuckDuckGo) for current events.
It loops through tool execution until satisfied, then synthesizes a final answer with strict source citations.
Child chunk → fetch parent → Tool Result → Agent State → LLM Synthesis