Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,25 @@
# Changelog

## v1.2.0 — Temporal Search (2026-03-26)

### Added
- **Temporal search lane** (`temporal.rs`) — 5th RRF lane for time-aware queries
- **Date extraction** — from frontmatter `date:` field or `YYYY-MM-DD` filename pattern
- **Heuristic date parsing** — "today", "yesterday", "last week", "this month", "recent", month names, ISO dates, date ranges
- **LLM date extraction** — orchestrator detects temporal intent and extracts date ranges from natural language
- **Temporal scoring** — smooth decay function for files near but outside the target date range
- **Temporal candidate injection** — date-matched files enter candidate pool as graph seeds
- **Confidence % display** — search results show normalized confidence (0-100%) instead of raw RRF scores
- **Date coverage stats** — `engraph status` shows how many files have extractable dates

### Changed
- `QueryIntent` gains `Temporal` variant with custom lane weights (temporal: 1.5)
- `OrchestrationResult` gains `date_range` field (backward-compatible serde)
- `LaneWeights` gains `temporal` field (0.0 for non-temporal intents)
- `insert_file` signature extended with `note_date` parameter
- Module count: 22 → 23
- Test count: 318 → 361

## [1.1.0] - 2026-03-26 — Complete Vault Gateway

### Added
Expand Down
11 changes: 6 additions & 5 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Local knowledge graph + intelligence layer for Obsidian vaults. Rust CLI + MCP s

## Architecture

Single binary with 22 modules behind a lib crate:
Single binary with 23 modules behind a lib crate:

- `config.rs` — loads `~/.engraph/config.toml` and `vault.toml`, merges CLI args, provides `data_dir()`. Includes `intelligence: Option<bool>`, `[models]` section for model overrides, `[obsidian]` section (CLI path, enabled flag), and `[agents]` section (registered AI agent names). `Config::save()` writes back to disk.
- `chunker.rs` — smart chunking with break-point scoring algorithm. Finds optimal split points considering headings, code fences, blank lines, and thematic breaks. `split_oversized_chunks()` handles token-aware secondary splitting with overlap
Expand All @@ -27,13 +27,14 @@ Single binary with 22 modules behind a lib crate:
- `profile.rs` — vault profile detection. Auto-detects PARA/Folders/Flat structure, vault type (Obsidian/Logseq/Plain), wikilinks, frontmatter, tags. Content-based role detection for people/daily/archive folders by content patterns (not just names). Writes/loads `vault.toml`
- `store.rs` — SQLite persistence. Tables: `meta`, `files` (with docid, created_by), `chunks` (with vector BLOBs), `chunks_fts` (FTS5), `edges` (vault graph), `tombstones`, `tag_registry`, `folder_centroids`, `placement_corrections`, `link_skiplist` (reserved), `llm_cache` (orchestrator result cache), `cli_events` (audit log for CLI operations). `vec_chunks` virtual table (sqlite-vec) for KNN search. Dynamic embedding dimension stored in meta. `has_dimension_mismatch()` and `reset_for_reindex()` for migration. Enhanced `resolve_file()` with fuzzy Levenshtein matching as final fallback
- `indexer.rs` — orchestrates vault walking (via `ignore` crate for `.gitignore` support), diffing, chunking, embedding, writes to store + sqlite-vec + FTS5, vault graph edge building (wikilinks + people detection), and folder centroid computation. Exposes `index_file`, `remove_file`, `rename_file` as public per-file functions. `run_index_shared` accepts external store/embedder for watcher FullRescan. Dimension migration on model change.
- `search.rs` — hybrid search orchestrator. `search_with_intelligence()` runs the full pipeline: orchestrate (intent + expansions) → 3-lane retrieval per expansion → RRF pass 1 → reranker 4th lane → RRF pass 2. `search_internal()` is a thin wrapper without intelligence models. Adaptive lane weights per query intent.
- `temporal.rs` — temporal search lane. Extracts note dates from frontmatter `date:` field or `YYYY-MM-DD` filename patterns. Heuristic date parsing for natural language ("today", "yesterday", "last week", "this month", "recent", month names, ISO dates, date ranges). Smooth decay scoring for files near but outside target date range. Provides `extract_note_date()` for indexing and `score_temporal()` + `parse_date_range_heuristic()` for search
- `search.rs` — hybrid search orchestrator. `search_with_intelligence()` runs the full pipeline: orchestrate (intent + expansions) → 5-lane RRF retrieval (semantic + FTS5 + graph + reranker + temporal) per expansion → two-pass RRF fusion. `search_internal()` is a thin wrapper without intelligence models. Adaptive lane weights per query intent including temporal (1.5 weight for time-aware queries). Results display normalized confidence percentages (0-100%) instead of raw RRF scores.

`main.rs` is a thin clap CLI (async via `#[tokio::main]`). Subcommands: `index` (with progress bar), `search` (with `--explain`, loads intelligence models when enabled), `status` (shows intelligence state), `clear`, `init` (intelligence onboarding prompt, detects Obsidian CLI + AI agents), `configure` (`--enable-intelligence`, `--disable-intelligence`, `--model`, `--obsidian-cli`, `--no-obsidian-cli`, `--agent`), `models`, `graph` (show/stats), `context` (read/list/vault-map/who/project/topic), `write` (create/append/update-metadata/move/edit/rewrite/edit-frontmatter/delete), `serve` (MCP stdio server with file watcher + intelligence).
`main.rs` is a thin clap CLI (async via `#[tokio::main]`). Subcommands: `index` (with progress bar), `search` (with `--explain`, loads intelligence models when enabled), `status` (shows intelligence state + date coverage stats), `clear`, `init` (intelligence onboarding prompt, detects Obsidian CLI + AI agents), `configure` (`--enable-intelligence`, `--disable-intelligence`, `--model`, `--obsidian-cli`, `--no-obsidian-cli`, `--agent`), `models`, `graph` (show/stats), `context` (read/list/vault-map/who/project/topic), `write` (create/append/update-metadata/move/edit/rewrite/edit-frontmatter/delete), `serve` (MCP stdio server with file watcher + intelligence).

## Key patterns

- **4-lane hybrid search:** Queries run through up to four lanes — semantic (sqlite-vec KNN embeddings), keyword (FTS5 BM25), graph (wikilink expansion), and cross-encoder reranking. A research orchestrator classifies query intent and sets adaptive lane weights. Two-pass RRF: 3-lane retrieval → reranker scores top 30 → 4-lane fusion. When intelligence is off, falls back to heuristic intent classification with 3-lane search (v0.7 behavior)
- **5-lane hybrid search:** Queries run through up to five lanes — semantic (sqlite-vec KNN embeddings), keyword (FTS5 BM25), graph (wikilink expansion), cross-encoder reranking, and temporal (date-range scoring). A research orchestrator classifies query intent and sets adaptive lane weights. Two-pass RRF: retrieval lanes → reranker scores top 30 → 5-lane fusion. When intelligence is off, falls back to heuristic intent classification. Temporal intent detection works with both heuristic and LLM orchestrators
- **Vault graph:** `edges` table stores bidirectional wikilink edges and mention edges. Built during indexing after all files are written. People detection scans for person name/alias mentions using notes from the configured People folder
- **Graph agent:** Expands seed results by following wikilinks 1-2 hops. Decay: 0.8x for 1-hop, 0.5x for 2-hop. Relevance filter: must contain query term (FTS5) or share tags with seed. Multi-parent merge takes highest score
- **Smart chunking:** Break-point scoring algorithm assigns scores to potential split points (headings 50-100, code fences 80, thematic breaks 60, blank lines 20). Code fence protection prevents splitting inside code blocks
Expand Down Expand Up @@ -73,7 +74,7 @@ Single vault only. Re-indexing a different vault path triggers a confirmation pr

## Testing

- Unit tests in each module (`cargo test --lib`) — 318 tests, no network required
- Unit tests in each module (`cargo test --lib`) — 361 tests, no network required
- Integration tests (`cargo test --test integration -- --ignored`) — require GGUF model download
- Build requires CMake (for llama.cpp C++ compilation)

Expand Down
12 changes: 12 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ indicatif = "0.17"
sqlite-vec = "0.1.8-alpha.1"
zerocopy = { version = "0.7", features = ["derive"] }
rayon = "1"
time = "0.3"
time = { version = "0.3", features = ["parsing", "formatting", "macros"] }
strsim = "0.11"
ignore = "0.4"
rmcp = { version = "1.2", features = ["transport-io"] }
Expand Down
26 changes: 14 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ engraph turns your markdown vault into a searchable knowledge graph that AI agen

Plain vector search treats your notes as isolated documents. But knowledge isn't flat — your notes link to each other, share tags, reference the same people and projects. engraph understands these connections.

- **4-lane hybrid search** — semantic embeddings + BM25 full-text + graph expansion + cross-encoder reranking, fused via [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf). An LLM orchestrator classifies queries and adapts lane weights per intent.
- **5-lane hybrid search** — semantic embeddings + BM25 full-text + graph expansion + cross-encoder reranking + temporal scoring, fused via [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf). An LLM orchestrator classifies queries and adapts lane weights per intent. Time-aware queries like "what happened last week" or "March 2026 notes" activate the temporal lane automatically.
- **MCP server for AI agents** — `engraph serve` exposes 19 tools (search, read, section-level editing, frontmatter mutations, vault health, context bundles, note creation) that Claude, Cursor, or any MCP client can call directly.
- **Section-level editing** — AI agents can read, replace, prepend, or append to specific sections by heading. Full note rewriting with frontmatter preservation. Granular frontmatter mutations (set/remove fields, add/remove tags and aliases).
- **Vault health diagnostics** — detect orphan notes, broken wikilinks, stale content, and tag hygiene issues. Available as MCP tool and CLI command.
Expand Down Expand Up @@ -65,7 +65,7 @@ Your vault (markdown files)
```

1. **Index** — walks your vault, chunks markdown by headings, embeds with a local GGUF model via llama.cpp (Metal GPU on macOS), stores everything in SQLite with FTS5 + sqlite-vec + a wikilink graph
2. **Search** — an orchestrator classifies the query and sets lane weights, then runs up to four lanes (semantic KNN, BM25 keyword, graph expansion, cross-encoder reranking), fused via RRF
2. **Search** — an orchestrator classifies the query and sets lane weights, then runs up to five lanes (semantic KNN, BM25 keyword, graph expansion, cross-encoder reranking, temporal scoring), fused via RRF
3. **Serve** — starts an MCP server that AI agents connect to, with a file watcher that re-indexes changes in real time

## Quick start
Expand Down Expand Up @@ -98,13 +98,13 @@ engraph search "how does the auth system work"
```

```
1. [0.04] 02-Areas/Development/Auth-Architecture.md > # Auth Architecture #6e1b70
1. [97%] 02-Areas/Development/Auth-Architecture.md > # Auth Architecture #6e1b70
OAuth 2.0 with PKCE for all client types. Session tokens stored in HTTP-only cookies...

2. [0.04] 01-Projects/API-Design.md > # API Design #e3e350
2. [95%] 01-Projects/API-Design.md > # API Design #e3e350
All endpoints require Bearer token authentication. Tokens are issued by the OAuth 2.0...

3. [0.04] 03-Resources/People/Sarah-Chen.md > # Sarah Chen #4adb39
3. [91%] 03-Resources/People/Sarah-Chen.md > # Sarah Chen #4adb39
Senior Backend Engineer. Tech lead for authentication and security systems...
```

Expand Down Expand Up @@ -145,7 +145,7 @@ engraph configure --enable-intelligence
engraph search "how does authentication work" --explain
```
```
1. [0.04] 01-Projects/API-Design.md > # API Design #e3e350
1. [97%] 01-Projects/API-Design.md > # API Design #e3e350
All endpoints require Bearer token authentication...

Intent: Conceptual
Expand Down Expand Up @@ -248,7 +248,7 @@ Returns orphan notes (no links in or out), broken wikilinks, stale notes, and ta

| | engraph | Basic RAG (vector-only) | Obsidian search |
|---|---|---|---|
| Search method | 4-lane RRF (semantic + BM25 + graph + reranker) | Vector similarity only | Keyword only |
| Search method | 5-lane RRF (semantic + BM25 + graph + reranker + temporal) | Vector similarity only | Keyword only |
| Query understanding | LLM orchestrator classifies intent, adapts weights | None | None |
| Understands note links | Yes (wikilink graph traversal) | No | Limited (backlinks panel) |
| AI agent access | MCP server (19 tools) | Custom API needed | No |
Expand All @@ -262,7 +262,9 @@ engraph is not a replacement for Obsidian — it's the intelligence layer that s

## Current capabilities

- 4-lane hybrid search (semantic + FTS5 + graph + cross-encoder reranker) with two-pass RRF fusion
- 5-lane hybrid search (semantic + FTS5 + graph + cross-encoder reranker + temporal) with two-pass RRF fusion
- Temporal search: natural language date queries ("last week", "March 2026", "recent"), date extraction from frontmatter and filenames, smooth decay scoring
- Confidence % display: search results show normalized 0-100% confidence instead of raw RRF scores
- LLM research orchestrator: query intent classification + query expansion + adaptive lane weights
- llama.cpp inference via Rust bindings (GGUF models, Metal GPU on macOS, CUDA on Linux)
- Intelligence opt-in: heuristic fallback when disabled, LLM-powered when enabled
Expand All @@ -281,7 +283,7 @@ engraph is not a replacement for Obsidian — it's the intelligence layer that s
- Enhanced file resolution with fuzzy Levenshtein matching fallback
- Content-based folder role detection (people, daily, archive) by content patterns
- Configurable model overrides for multilingual support
- 318 unit tests, CI on macOS + Ubuntu
- 361 unit tests, CI on macOS + Ubuntu

## Roadmap

Expand All @@ -290,7 +292,7 @@ engraph is not a replacement for Obsidian — it's the intelligence layer that s
- [x] ~~MCP edit/rewrite tools — full note editing for AI agents~~ (v1.1)
- [x] ~~Vault health monitor — orphan notes, broken links, stale content, tag hygiene~~ (v1.1)
- [x] ~~Obsidian CLI integration — auto-detect and delegate with circuit breaker~~ (v1.1)
- [ ] Temporal search — find notes by time period, detect trends (v1.2)
- [x] ~~Temporal search — find notes by time period, date-aware queries~~ (v1.2)
- [ ] HTTP/REST API — complement MCP with a standard web API (v1.3)
- [ ] Multi-vault — search across multiple vaults (v1.4)

Expand Down Expand Up @@ -326,7 +328,7 @@ All data stored in `~/.engraph/` — single SQLite database (~10MB typical), GGU
## Development

```bash
cargo test --lib # 318 unit tests, no network (requires CMake for llama.cpp)
cargo test --lib # 361 unit tests, no network (requires CMake for llama.cpp)
cargo clippy -- -D warnings
cargo fmt --check

Expand All @@ -338,7 +340,7 @@ cargo test --test integration -- --ignored

Contributions welcome. Please open an issue first to discuss what you'd like to change.

The codebase is 22 Rust modules behind a lib crate. `CLAUDE.md` in the repo root has detailed architecture documentation for AI-assisted development.
The codebase is 23 Rust modules behind a lib crate. `CLAUDE.md` in the repo root has detailed architecture documentation for AI-assisted development.

## License

Expand Down
Loading
Loading