Personal Knowledge OS — Full Architecture

A production-grade system that ingests everything you read and write, builds a knowledge graph over it, and lets you query your own mind with hybrid RAG retrieval.

Architecture Flow

1. Ingestion Pipeline

graph TD
    A[Chrome Extension] -->|Capture Context| B[FastAPI /ingest]
    B -->|Async Task| C[Celery Worker]
    C --> D[(Redis - Dedup)]
    C --> P[(PostgreSQL - Meta)]
    C --> E[Chunk & Embed]
    E --> F[(Qdrant - Vectors)]
    C --> G[spaCy NER]
    G --> H[(Neo4j - Graph)]

2. Agentic Retrieval Pipeline

graph TD
    I[Extension UI] -->|Query| J[FastAPI /query]
    J --> K[LangGraph Agent]
    
    K -->|Tool: Web| L[DuckDuckGo Search]
    K -->|Tool: Personal DB| M[Hybrid Search]
    
    M --> N[Vector Search]
    M --> O[Graph Search]
    N --> Q[RRF & Cross-Encoder]
    O --> Q
    Q --> R[Parent Assembly]
    R --> K
    
    K --> S[Gemini Synthesis]
    S --> T[Answer + Citations]

Ingestion Layer
Processing Pipeline
Memory Architecture
Query & Retrieval
Design Decisions

1. Ingestion Layer

Chrome Extension + Desktop App: For now focusing on Chrome Extension.

Chrome Extension

What it captures:

Page content as you browse (reader-mode cleaned HTML)
Selected text and highlights
Tab metadata (URL, title, time spent)
YouTube transcripts via transcript API
PDFs opened in the browser

How it works:

A content script injects into every page. On tab close or user trigger, it extracts cleaned text via Readability.js, then POSTs to your local FastAPI server on port 8000.

chrome.tabs API + content scripts + background service worker

Smart triggers — don't capture everything:

Time-on-page threshold: 30 seconds minimum
Scroll depth: 40% minimum
Or explicit "save this" hotkey

Noise is the enemy. Capturing everything makes retrieval worthless.

What you explicitly do NOT capture (noise control)

Block the following(harder than capturing):

Social media feeds
Email inboxes
Shopping pages
Pages under 200 words
Pages visited under 15 seconds
Duplicate URLs within 7 days

2. Processing Pipeline

All steps run async in the background via Celery. The ingestion endpoint returns immediately with a job ID.

Step 1 — Ingest & Clean

Raw HTML → Readability.js (browser) or trafilatura (server) → clean text.

Strip boilerplate, navigation, ads. Extract structured metadata:

Title
Author
Publish date
Domain
Word count

Step 2 — Chunk Strategy

Use parent-child chunking:

Large parent chunks: 1024 tokens — for context
Small child chunks: 256 tokens — for precise retrieval
Store both, link child → parent

For long documents, use semantic chunking — spliting on topic boundaries, not character count.

Step 3 — Embed

Embed child chunks.

Store vectors in Qdrant with payload:

Step 4 — Entity Extraction → Knowledge Graph

Use Claude or spaCy to extract named entities and relationships from each document. Write them as nodes and edges to Neo4j.

TMaking it a "knowledge OS" vs a plain vector store.

Step 5 — Deduplication

Use Hashing on a content fingerprint before storing. If similarity > 0.92 with an existing document, skip or merge.

Prevents the same article from N different sources polluting retrieval.

Step 6 — Async Task Queue

Steps 2–5 run as Celery tasks. This keeps the Chrome extension feeling snappy — ingestion returns instantly, heavy work happens in background workers.

Celery + Redis broker → worker pool → status streamed via SSE

3. Memory Architecture

3-tier memory model

Tier	Store	Read speed	What lives here
Tier 1 — Ephemeral	Python dict (in-process)	~0ms	Current session context, conversation history with query agent. Cleared on restart.
Tier 2 — Session	Redis	~1ms	Recent ingestion queue, dedup fingerprints, job status, user preferences, search cache (TTL 24h).
Tier 3 — Persistent	Qdrant + Neo4j + Postgres	~5–50ms	Vectors, knowledge graph, document metadata, full text, eval logs.

4. Query & Retrieval

Hybrid retrieval pipeline

Step 1 — Query Analysis

Classify the query type and route to the appropriate retrieval strategy:

Query type	Example	Strategy
Factual lookup	"What is HyDE?"	BM25 + vector
Exploratory	"What do I know about RAG?"	HyDE + vector
Entity-centric	"What did I read about Sam Altman?"	Graph traversal first
Temporal	"What did I read last week about X?"	Postgres filter + vector

Step 2 — Multi-query Expansion (HyDE)

Generate 3 hypothetical answers to the query, embed them, and use as additional search vectors.

Dramatically improves recall for vague or exploratory queries.

LlamaIndex: HypotheticalDocumentEmbedder

Step 3 — Parallel Retrieval (vector + graph + BM25)

Run all three simultaneously

Merge results using Reciprocal Rank Fusion (RRF)

Step 4 — Re-ranking

Cross-encoder re-rank top-20 candidates down to top-5.

This single step cuts hallucination rate significantly.

Step 5 — Agentic Synthesis (LangGraph)

Query execution is orchestrated by a stateful LangGraph AI agent.

The agent acts as a router, deciding autonomously whether to use the search_personal_knowledge tool (which triggers steps 1-4) or a search_live_web tool (using DuckDuckGo) for current events.

It loops through tool execution until satisfied, then synthesizes a final answer with strict source citations.

Child chunk → fetch parent → Tool Result → Agent State → LLM Synthesis

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
backend		backend
extension		extension
.gitignore		.gitignore
LOCAL_SETUP.md		LOCAL_SETUP.md
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Personal Knowledge OS — Full Architecture

Architecture Flow

1. Ingestion Pipeline

2. Agentic Retrieval Pipeline

1. Ingestion Layer

Chrome Extension + Desktop App: For now focusing on Chrome Extension.

Chrome Extension

What you explicitly do NOT capture (noise control)

2. Processing Pipeline

Step 1 — Ingest & Clean

Step 2 — Chunk Strategy

Step 3 — Embed

Step 4 — Entity Extraction → Knowledge Graph

Step 5 — Deduplication

Step 6 — Async Task Queue

3. Memory Architecture

3-tier memory model

4. Query & Retrieval

Hybrid retrieval pipeline

Step 1 — Query Analysis

Step 2 — Multi-query Expansion (HyDE)

Step 3 — Parallel Retrieval (vector + graph + BM25)

Step 4 — Re-ranking

Step 5 — Agentic Synthesis (LangGraph)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Personal Knowledge OS — Full Architecture

Architecture Flow

1. Ingestion Pipeline

2. Agentic Retrieval Pipeline

1. Ingestion Layer

Chrome Extension + Desktop App: For now focusing on Chrome Extension.

Chrome Extension

What you explicitly do NOT capture (noise control)

2. Processing Pipeline

Step 1 — Ingest & Clean

Step 2 — Chunk Strategy

Step 3 — Embed

Step 4 — Entity Extraction → Knowledge Graph

Step 5 — Deduplication

Step 6 — Async Task Queue

3. Memory Architecture

3-tier memory model

4. Query & Retrieval

Hybrid retrieval pipeline

Step 1 — Query Analysis

Step 2 — Multi-query Expansion (HyDE)

Step 3 — Parallel Retrieval (vector + graph + BM25)

Step 4 — Re-ranking

Step 5 — Agentic Synthesis (LangGraph)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages