You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CLAUDE.md
+6-5Lines changed: 6 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ Local knowledge graph + intelligence layer for Obsidian vaults. Rust CLI + MCP s
4
4
5
5
## Architecture
6
6
7
-
Single binary with 22 modules behind a lib crate:
7
+
Single binary with 23 modules behind a lib crate:
8
8
9
9
-`config.rs` — loads `~/.engraph/config.toml` and `vault.toml`, merges CLI args, provides `data_dir()`. Includes `intelligence: Option<bool>`, `[models]` section for model overrides, `[obsidian]` section (CLI path, enabled flag), and `[agents]` section (registered AI agent names). `Config::save()` writes back to disk.
10
10
-`chunker.rs` — smart chunking with break-point scoring algorithm. Finds optimal split points considering headings, code fences, blank lines, and thematic breaks. `split_oversized_chunks()` handles token-aware secondary splitting with overlap
@@ -27,13 +27,14 @@ Single binary with 22 modules behind a lib crate:
27
27
-`profile.rs` — vault profile detection. Auto-detects PARA/Folders/Flat structure, vault type (Obsidian/Logseq/Plain), wikilinks, frontmatter, tags. Content-based role detection for people/daily/archive folders by content patterns (not just names). Writes/loads `vault.toml`
28
28
-`store.rs` — SQLite persistence. Tables: `meta`, `files` (with docid, created_by), `chunks` (with vector BLOBs), `chunks_fts` (FTS5), `edges` (vault graph), `tombstones`, `tag_registry`, `folder_centroids`, `placement_corrections`, `link_skiplist` (reserved), `llm_cache` (orchestrator result cache), `cli_events` (audit log for CLI operations). `vec_chunks` virtual table (sqlite-vec) for KNN search. Dynamic embedding dimension stored in meta. `has_dimension_mismatch()` and `reset_for_reindex()` for migration. Enhanced `resolve_file()` with fuzzy Levenshtein matching as final fallback
29
29
-`indexer.rs` — orchestrates vault walking (via `ignore` crate for `.gitignore` support), diffing, chunking, embedding, writes to store + sqlite-vec + FTS5, vault graph edge building (wikilinks + people detection), and folder centroid computation. Exposes `index_file`, `remove_file`, `rename_file` as public per-file functions. `run_index_shared` accepts external store/embedder for watcher FullRescan. Dimension migration on model change.
30
-
-`search.rs` — hybrid search orchestrator. `search_with_intelligence()` runs the full pipeline: orchestrate (intent + expansions) → 3-lane retrieval per expansion → RRF pass 1 → reranker 4th lane → RRF pass 2. `search_internal()` is a thin wrapper without intelligence models. Adaptive lane weights per query intent.
30
+
-`temporal.rs` — temporal search lane. Extracts note dates from frontmatter `date:` field or `YYYY-MM-DD` filename patterns. Heuristic date parsing for natural language ("today", "yesterday", "last week", "this month", "recent", month names, ISO dates, date ranges). Smooth decay scoring for files near but outside target date range. Provides `extract_note_date()` for indexing and `score_temporal()` + `parse_date_range_heuristic()` for search
31
+
-`search.rs` — hybrid search orchestrator. `search_with_intelligence()` runs the full pipeline: orchestrate (intent + expansions) → 5-lane RRF retrieval (semantic + FTS5 + graph + reranker + temporal) per expansion → two-pass RRF fusion. `search_internal()` is a thin wrapper without intelligence models. Adaptive lane weights per query intent including temporal (1.5 weight for time-aware queries). Results display normalized confidence percentages (0-100%) instead of raw RRF scores.
31
32
32
-
`main.rs` is a thin clap CLI (async via `#[tokio::main]`). Subcommands: `index` (with progress bar), `search` (with `--explain`, loads intelligence models when enabled), `status` (shows intelligence state), `clear`, `init` (intelligence onboarding prompt, detects Obsidian CLI + AI agents), `configure` (`--enable-intelligence`, `--disable-intelligence`, `--model`, `--obsidian-cli`, `--no-obsidian-cli`, `--agent`), `models`, `graph` (show/stats), `context` (read/list/vault-map/who/project/topic), `write` (create/append/update-metadata/move/edit/rewrite/edit-frontmatter/delete), `serve` (MCP stdio server with file watcher + intelligence).
33
+
`main.rs` is a thin clap CLI (async via `#[tokio::main]`). Subcommands: `index` (with progress bar), `search` (with `--explain`, loads intelligence models when enabled), `status` (shows intelligence state + date coverage stats), `clear`, `init` (intelligence onboarding prompt, detects Obsidian CLI + AI agents), `configure` (`--enable-intelligence`, `--disable-intelligence`, `--model`, `--obsidian-cli`, `--no-obsidian-cli`, `--agent`), `models`, `graph` (show/stats), `context` (read/list/vault-map/who/project/topic), `write` (create/append/update-metadata/move/edit/rewrite/edit-frontmatter/delete), `serve` (MCP stdio server with file watcher + intelligence).
33
34
34
35
## Key patterns
35
36
36
-
-**4-lane hybrid search:** Queries run through up to four lanes — semantic (sqlite-vec KNN embeddings), keyword (FTS5 BM25), graph (wikilink expansion), and cross-encoder reranking. A research orchestrator classifies query intent and sets adaptive lane weights. Two-pass RRF: 3-lane retrieval → reranker scores top 30 → 4-lane fusion. When intelligence is off, falls back to heuristic intent classificationwith 3-lane search (v0.7 behavior)
37
+
-**5-lane hybrid search:** Queries run through up to five lanes — semantic (sqlite-vec KNN embeddings), keyword (FTS5 BM25), graph (wikilink expansion), cross-encoder reranking, and temporal (date-range scoring). A research orchestrator classifies query intent and sets adaptive lane weights. Two-pass RRF: retrieval lanes → reranker scores top 30 → 5-lane fusion. When intelligence is off, falls back to heuristic intent classification. Temporal intent detection works with both heuristic and LLM orchestrators
37
38
-**Vault graph:**`edges` table stores bidirectional wikilink edges and mention edges. Built during indexing after all files are written. People detection scans for person name/alias mentions using notes from the configured People folder
38
39
-**Graph agent:** Expands seed results by following wikilinks 1-2 hops. Decay: 0.8x for 1-hop, 0.5x for 2-hop. Relevance filter: must contain query term (FTS5) or share tags with seed. Multi-parent merge takes highest score
Copy file name to clipboardExpand all lines: README.md
+14-12Lines changed: 14 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ engraph turns your markdown vault into a searchable knowledge graph that AI agen
16
16
17
17
Plain vector search treats your notes as isolated documents. But knowledge isn't flat — your notes link to each other, share tags, reference the same people and projects. engraph understands these connections.
18
18
19
-
-**4-lane hybrid search** — semantic embeddings + BM25 full-text + graph expansion + cross-encoder reranking, fused via [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf). An LLM orchestrator classifies queries and adapts lane weights per intent.
19
+
-**5-lane hybrid search** — semantic embeddings + BM25 full-text + graph expansion + cross-encoder reranking + temporal scoring, fused via [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf). An LLM orchestrator classifies queries and adapts lane weights per intent. Time-aware queries like "what happened last week" or "March 2026 notes" activate the temporal lane automatically.
20
20
-**MCP server for AI agents** — `engraph serve` exposes 19 tools (search, read, section-level editing, frontmatter mutations, vault health, context bundles, note creation) that Claude, Cursor, or any MCP client can call directly.
21
21
-**Section-level editing** — AI agents can read, replace, prepend, or append to specific sections by heading. Full note rewriting with frontmatter preservation. Granular frontmatter mutations (set/remove fields, add/remove tags and aliases).
22
22
-**Vault health diagnostics** — detect orphan notes, broken wikilinks, stale content, and tag hygiene issues. Available as MCP tool and CLI command.
@@ -65,7 +65,7 @@ Your vault (markdown files)
65
65
```
66
66
67
67
1.**Index** — walks your vault, chunks markdown by headings, embeds with a local GGUF model via llama.cpp (Metal GPU on macOS), stores everything in SQLite with FTS5 + sqlite-vec + a wikilink graph
68
-
2.**Search** — an orchestrator classifies the query and sets lane weights, then runs up to four lanes (semantic KNN, BM25 keyword, graph expansion, cross-encoder reranking), fused via RRF
68
+
2.**Search** — an orchestrator classifies the query and sets lane weights, then runs up to five lanes (semantic KNN, BM25 keyword, graph expansion, cross-encoder reranking, temporal scoring), fused via RRF
69
69
3.**Serve** — starts an MCP server that AI agents connect to, with a file watcher that re-indexes changes in real time
70
70
71
71
## Quick start
@@ -98,13 +98,13 @@ engraph search "how does the auth system work"
- Temporal search: natural language date queries ("last week", "March 2026", "recent"), date extraction from frontmatter and filenames, smooth decay scoring
267
+
- Confidence % display: search results show normalized 0-100% confidence instead of raw RRF scores
266
268
- LLM research orchestrator: query intent classification + query expansion + adaptive lane weights
267
269
- llama.cpp inference via Rust bindings (GGUF models, Metal GPU on macOS, CUDA on Linux)
268
270
- Intelligence opt-in: heuristic fallback when disabled, LLM-powered when enabled
@@ -281,7 +283,7 @@ engraph is not a replacement for Obsidian — it's the intelligence layer that s
281
283
- Enhanced file resolution with fuzzy Levenshtein matching fallback
282
284
- Content-based folder role detection (people, daily, archive) by content patterns
283
285
- Configurable model overrides for multilingual support
284
-
-318 unit tests, CI on macOS + Ubuntu
286
+
-361 unit tests, CI on macOS + Ubuntu
285
287
286
288
## Roadmap
287
289
@@ -290,7 +292,7 @@ engraph is not a replacement for Obsidian — it's the intelligence layer that s
290
292
-[x]~~MCP edit/rewrite tools — full note editing for AI agents~~ (v1.1)
291
293
-[x]~~Vault health monitor — orphan notes, broken links, stale content, tag hygiene~~ (v1.1)
292
294
-[x]~~Obsidian CLI integration — auto-detect and delegate with circuit breaker~~ (v1.1)
293
-
-[ ]Temporal search — find notes by time period, detect trends (v1.2)
295
+
-[x]~~Temporal search — find notes by time period, date-aware queries~~ (v1.2)
294
296
-[ ] HTTP/REST API — complement MCP with a standard web API (v1.3)
295
297
-[ ] Multi-vault — search across multiple vaults (v1.4)
296
298
@@ -326,7 +328,7 @@ All data stored in `~/.engraph/` — single SQLite database (~10MB typical), GGU
326
328
## Development
327
329
328
330
```bash
329
-
cargo test --lib #318 unit tests, no network (requires CMake for llama.cpp)
331
+
cargo test --lib #361 unit tests, no network (requires CMake for llama.cpp)
330
332
cargo clippy -- -D warnings
331
333
cargo fmt --check
332
334
@@ -338,7 +340,7 @@ cargo test --test integration -- --ignored
338
340
339
341
Contributions welcome. Please open an issue first to discuss what you'd like to change.
340
342
341
-
The codebase is 22 Rust modules behind a lib crate. `CLAUDE.md` in the repo root has detailed architecture documentation for AI-assisted development.
343
+
The codebase is 23 Rust modules behind a lib crate. `CLAUDE.md` in the repo root has detailed architecture documentation for AI-assisted development.
0 commit comments