Skip to content

Commit c703fad

Browse files
authored
Merge pull request #14 from devwhodevs/feature/v1.2-temporal-search
feat: v1.2 — Temporal Search
2 parents 2b80951 + 98aba42 commit c703fad

20 files changed

+1360
-126
lines changed

CHANGELOG.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,25 @@
11
# Changelog
22

3+
## v1.2.0 — Temporal Search (2026-03-26)
4+
5+
### Added
6+
- **Temporal search lane** (`temporal.rs`) — 5th RRF lane for time-aware queries
7+
- **Date extraction** — from frontmatter `date:` field or `YYYY-MM-DD` filename pattern
8+
- **Heuristic date parsing** — "today", "yesterday", "last week", "this month", "recent", month names, ISO dates, date ranges
9+
- **LLM date extraction** — orchestrator detects temporal intent and extracts date ranges from natural language
10+
- **Temporal scoring** — smooth decay function for files near but outside the target date range
11+
- **Temporal candidate injection** — date-matched files enter candidate pool as graph seeds
12+
- **Confidence % display** — search results show normalized confidence (0-100%) instead of raw RRF scores
13+
- **Date coverage stats**`engraph status` shows how many files have extractable dates
14+
15+
### Changed
16+
- `QueryIntent` gains `Temporal` variant with custom lane weights (temporal: 1.5)
17+
- `OrchestrationResult` gains `date_range` field (backward-compatible serde)
18+
- `LaneWeights` gains `temporal` field (0.0 for non-temporal intents)
19+
- `insert_file` signature extended with `note_date` parameter
20+
- Module count: 22 → 23
21+
- Test count: 318 → 361
22+
323
## [1.1.0] - 2026-03-26 — Complete Vault Gateway
424

525
### Added

CLAUDE.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Local knowledge graph + intelligence layer for Obsidian vaults. Rust CLI + MCP s
44

55
## Architecture
66

7-
Single binary with 22 modules behind a lib crate:
7+
Single binary with 23 modules behind a lib crate:
88

99
- `config.rs` — loads `~/.engraph/config.toml` and `vault.toml`, merges CLI args, provides `data_dir()`. Includes `intelligence: Option<bool>`, `[models]` section for model overrides, `[obsidian]` section (CLI path, enabled flag), and `[agents]` section (registered AI agent names). `Config::save()` writes back to disk.
1010
- `chunker.rs` — smart chunking with break-point scoring algorithm. Finds optimal split points considering headings, code fences, blank lines, and thematic breaks. `split_oversized_chunks()` handles token-aware secondary splitting with overlap
@@ -27,13 +27,14 @@ Single binary with 22 modules behind a lib crate:
2727
- `profile.rs` — vault profile detection. Auto-detects PARA/Folders/Flat structure, vault type (Obsidian/Logseq/Plain), wikilinks, frontmatter, tags. Content-based role detection for people/daily/archive folders by content patterns (not just names). Writes/loads `vault.toml`
2828
- `store.rs` — SQLite persistence. Tables: `meta`, `files` (with docid, created_by), `chunks` (with vector BLOBs), `chunks_fts` (FTS5), `edges` (vault graph), `tombstones`, `tag_registry`, `folder_centroids`, `placement_corrections`, `link_skiplist` (reserved), `llm_cache` (orchestrator result cache), `cli_events` (audit log for CLI operations). `vec_chunks` virtual table (sqlite-vec) for KNN search. Dynamic embedding dimension stored in meta. `has_dimension_mismatch()` and `reset_for_reindex()` for migration. Enhanced `resolve_file()` with fuzzy Levenshtein matching as final fallback
2929
- `indexer.rs` — orchestrates vault walking (via `ignore` crate for `.gitignore` support), diffing, chunking, embedding, writes to store + sqlite-vec + FTS5, vault graph edge building (wikilinks + people detection), and folder centroid computation. Exposes `index_file`, `remove_file`, `rename_file` as public per-file functions. `run_index_shared` accepts external store/embedder for watcher FullRescan. Dimension migration on model change.
30-
- `search.rs` — hybrid search orchestrator. `search_with_intelligence()` runs the full pipeline: orchestrate (intent + expansions) → 3-lane retrieval per expansion → RRF pass 1 → reranker 4th lane → RRF pass 2. `search_internal()` is a thin wrapper without intelligence models. Adaptive lane weights per query intent.
30+
- `temporal.rs` — temporal search lane. Extracts note dates from frontmatter `date:` field or `YYYY-MM-DD` filename patterns. Heuristic date parsing for natural language ("today", "yesterday", "last week", "this month", "recent", month names, ISO dates, date ranges). Smooth decay scoring for files near but outside target date range. Provides `extract_note_date()` for indexing and `score_temporal()` + `parse_date_range_heuristic()` for search
31+
- `search.rs` — hybrid search orchestrator. `search_with_intelligence()` runs the full pipeline: orchestrate (intent + expansions) → 5-lane RRF retrieval (semantic + FTS5 + graph + reranker + temporal) per expansion → two-pass RRF fusion. `search_internal()` is a thin wrapper without intelligence models. Adaptive lane weights per query intent including temporal (1.5 weight for time-aware queries). Results display normalized confidence percentages (0-100%) instead of raw RRF scores.
3132

32-
`main.rs` is a thin clap CLI (async via `#[tokio::main]`). Subcommands: `index` (with progress bar), `search` (with `--explain`, loads intelligence models when enabled), `status` (shows intelligence state), `clear`, `init` (intelligence onboarding prompt, detects Obsidian CLI + AI agents), `configure` (`--enable-intelligence`, `--disable-intelligence`, `--model`, `--obsidian-cli`, `--no-obsidian-cli`, `--agent`), `models`, `graph` (show/stats), `context` (read/list/vault-map/who/project/topic), `write` (create/append/update-metadata/move/edit/rewrite/edit-frontmatter/delete), `serve` (MCP stdio server with file watcher + intelligence).
33+
`main.rs` is a thin clap CLI (async via `#[tokio::main]`). Subcommands: `index` (with progress bar), `search` (with `--explain`, loads intelligence models when enabled), `status` (shows intelligence state + date coverage stats), `clear`, `init` (intelligence onboarding prompt, detects Obsidian CLI + AI agents), `configure` (`--enable-intelligence`, `--disable-intelligence`, `--model`, `--obsidian-cli`, `--no-obsidian-cli`, `--agent`), `models`, `graph` (show/stats), `context` (read/list/vault-map/who/project/topic), `write` (create/append/update-metadata/move/edit/rewrite/edit-frontmatter/delete), `serve` (MCP stdio server with file watcher + intelligence).
3334

3435
## Key patterns
3536

36-
- **4-lane hybrid search:** Queries run through up to four lanes — semantic (sqlite-vec KNN embeddings), keyword (FTS5 BM25), graph (wikilink expansion), and cross-encoder reranking. A research orchestrator classifies query intent and sets adaptive lane weights. Two-pass RRF: 3-lane retrieval → reranker scores top 30 → 4-lane fusion. When intelligence is off, falls back to heuristic intent classification with 3-lane search (v0.7 behavior)
37+
- **5-lane hybrid search:** Queries run through up to five lanes — semantic (sqlite-vec KNN embeddings), keyword (FTS5 BM25), graph (wikilink expansion), cross-encoder reranking, and temporal (date-range scoring). A research orchestrator classifies query intent and sets adaptive lane weights. Two-pass RRF: retrieval lanes → reranker scores top 30 → 5-lane fusion. When intelligence is off, falls back to heuristic intent classification. Temporal intent detection works with both heuristic and LLM orchestrators
3738
- **Vault graph:** `edges` table stores bidirectional wikilink edges and mention edges. Built during indexing after all files are written. People detection scans for person name/alias mentions using notes from the configured People folder
3839
- **Graph agent:** Expands seed results by following wikilinks 1-2 hops. Decay: 0.8x for 1-hop, 0.5x for 2-hop. Relevance filter: must contain query term (FTS5) or share tags with seed. Multi-parent merge takes highest score
3940
- **Smart chunking:** Break-point scoring algorithm assigns scores to potential split points (headings 50-100, code fences 80, thematic breaks 60, blank lines 20). Code fence protection prevents splitting inside code blocks
@@ -73,7 +74,7 @@ Single vault only. Re-indexing a different vault path triggers a confirmation pr
7374

7475
## Testing
7576

76-
- Unit tests in each module (`cargo test --lib`) — 318 tests, no network required
77+
- Unit tests in each module (`cargo test --lib`) — 361 tests, no network required
7778
- Integration tests (`cargo test --test integration -- --ignored`) — require GGUF model download
7879
- Build requires CMake (for llama.cpp C++ compilation)
7980

Cargo.lock

Lines changed: 12 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ indicatif = "0.17"
2828
sqlite-vec = "0.1.8-alpha.1"
2929
zerocopy = { version = "0.7", features = ["derive"] }
3030
rayon = "1"
31-
time = "0.3"
31+
time = { version = "0.3", features = ["parsing", "formatting", "macros"] }
3232
strsim = "0.11"
3333
ignore = "0.4"
3434
rmcp = { version = "1.2", features = ["transport-io"] }

README.md

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ engraph turns your markdown vault into a searchable knowledge graph that AI agen
1616

1717
Plain vector search treats your notes as isolated documents. But knowledge isn't flat — your notes link to each other, share tags, reference the same people and projects. engraph understands these connections.
1818

19-
- **4-lane hybrid search** — semantic embeddings + BM25 full-text + graph expansion + cross-encoder reranking, fused via [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf). An LLM orchestrator classifies queries and adapts lane weights per intent.
19+
- **5-lane hybrid search** — semantic embeddings + BM25 full-text + graph expansion + cross-encoder reranking + temporal scoring, fused via [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf). An LLM orchestrator classifies queries and adapts lane weights per intent. Time-aware queries like "what happened last week" or "March 2026 notes" activate the temporal lane automatically.
2020
- **MCP server for AI agents**`engraph serve` exposes 19 tools (search, read, section-level editing, frontmatter mutations, vault health, context bundles, note creation) that Claude, Cursor, or any MCP client can call directly.
2121
- **Section-level editing** — AI agents can read, replace, prepend, or append to specific sections by heading. Full note rewriting with frontmatter preservation. Granular frontmatter mutations (set/remove fields, add/remove tags and aliases).
2222
- **Vault health diagnostics** — detect orphan notes, broken wikilinks, stale content, and tag hygiene issues. Available as MCP tool and CLI command.
@@ -65,7 +65,7 @@ Your vault (markdown files)
6565
```
6666

6767
1. **Index** — walks your vault, chunks markdown by headings, embeds with a local GGUF model via llama.cpp (Metal GPU on macOS), stores everything in SQLite with FTS5 + sqlite-vec + a wikilink graph
68-
2. **Search** — an orchestrator classifies the query and sets lane weights, then runs up to four lanes (semantic KNN, BM25 keyword, graph expansion, cross-encoder reranking), fused via RRF
68+
2. **Search** — an orchestrator classifies the query and sets lane weights, then runs up to five lanes (semantic KNN, BM25 keyword, graph expansion, cross-encoder reranking, temporal scoring), fused via RRF
6969
3. **Serve** — starts an MCP server that AI agents connect to, with a file watcher that re-indexes changes in real time
7070

7171
## Quick start
@@ -98,13 +98,13 @@ engraph search "how does the auth system work"
9898
```
9999

100100
```
101-
1. [0.04] 02-Areas/Development/Auth-Architecture.md > # Auth Architecture #6e1b70
101+
1. [97%] 02-Areas/Development/Auth-Architecture.md > # Auth Architecture #6e1b70
102102
OAuth 2.0 with PKCE for all client types. Session tokens stored in HTTP-only cookies...
103103
104-
2. [0.04] 01-Projects/API-Design.md > # API Design #e3e350
104+
2. [95%] 01-Projects/API-Design.md > # API Design #e3e350
105105
All endpoints require Bearer token authentication. Tokens are issued by the OAuth 2.0...
106106
107-
3. [0.04] 03-Resources/People/Sarah-Chen.md > # Sarah Chen #4adb39
107+
3. [91%] 03-Resources/People/Sarah-Chen.md > # Sarah Chen #4adb39
108108
Senior Backend Engineer. Tech lead for authentication and security systems...
109109
```
110110

@@ -145,7 +145,7 @@ engraph configure --enable-intelligence
145145
engraph search "how does authentication work" --explain
146146
```
147147
```
148-
1. [0.04] 01-Projects/API-Design.md > # API Design #e3e350
148+
1. [97%] 01-Projects/API-Design.md > # API Design #e3e350
149149
All endpoints require Bearer token authentication...
150150
151151
Intent: Conceptual
@@ -248,7 +248,7 @@ Returns orphan notes (no links in or out), broken wikilinks, stale notes, and ta
248248

249249
| | engraph | Basic RAG (vector-only) | Obsidian search |
250250
|---|---|---|---|
251-
| Search method | 4-lane RRF (semantic + BM25 + graph + reranker) | Vector similarity only | Keyword only |
251+
| Search method | 5-lane RRF (semantic + BM25 + graph + reranker + temporal) | Vector similarity only | Keyword only |
252252
| Query understanding | LLM orchestrator classifies intent, adapts weights | None | None |
253253
| Understands note links | Yes (wikilink graph traversal) | No | Limited (backlinks panel) |
254254
| AI agent access | MCP server (19 tools) | Custom API needed | No |
@@ -262,7 +262,9 @@ engraph is not a replacement for Obsidian — it's the intelligence layer that s
262262

263263
## Current capabilities
264264

265-
- 4-lane hybrid search (semantic + FTS5 + graph + cross-encoder reranker) with two-pass RRF fusion
265+
- 5-lane hybrid search (semantic + FTS5 + graph + cross-encoder reranker + temporal) with two-pass RRF fusion
266+
- Temporal search: natural language date queries ("last week", "March 2026", "recent"), date extraction from frontmatter and filenames, smooth decay scoring
267+
- Confidence % display: search results show normalized 0-100% confidence instead of raw RRF scores
266268
- LLM research orchestrator: query intent classification + query expansion + adaptive lane weights
267269
- llama.cpp inference via Rust bindings (GGUF models, Metal GPU on macOS, CUDA on Linux)
268270
- Intelligence opt-in: heuristic fallback when disabled, LLM-powered when enabled
@@ -281,7 +283,7 @@ engraph is not a replacement for Obsidian — it's the intelligence layer that s
281283
- Enhanced file resolution with fuzzy Levenshtein matching fallback
282284
- Content-based folder role detection (people, daily, archive) by content patterns
283285
- Configurable model overrides for multilingual support
284-
- 318 unit tests, CI on macOS + Ubuntu
286+
- 361 unit tests, CI on macOS + Ubuntu
285287

286288
## Roadmap
287289

@@ -290,7 +292,7 @@ engraph is not a replacement for Obsidian — it's the intelligence layer that s
290292
- [x] ~~MCP edit/rewrite tools — full note editing for AI agents~~ (v1.1)
291293
- [x] ~~Vault health monitor — orphan notes, broken links, stale content, tag hygiene~~ (v1.1)
292294
- [x] ~~Obsidian CLI integration — auto-detect and delegate with circuit breaker~~ (v1.1)
293-
- [ ] Temporal search — find notes by time period, detect trends (v1.2)
295+
- [x] ~~Temporal search — find notes by time period, date-aware queries~~ (v1.2)
294296
- [ ] HTTP/REST API — complement MCP with a standard web API (v1.3)
295297
- [ ] Multi-vault — search across multiple vaults (v1.4)
296298

@@ -326,7 +328,7 @@ All data stored in `~/.engraph/` — single SQLite database (~10MB typical), GGU
326328
## Development
327329

328330
```bash
329-
cargo test --lib # 318 unit tests, no network (requires CMake for llama.cpp)
331+
cargo test --lib # 361 unit tests, no network (requires CMake for llama.cpp)
330332
cargo clippy -- -D warnings
331333
cargo fmt --check
332334

@@ -338,7 +340,7 @@ cargo test --test integration -- --ignored
338340

339341
Contributions welcome. Please open an issue first to discuss what you'd like to change.
340342

341-
The codebase is 22 Rust modules behind a lib crate. `CLAUDE.md` in the repo root has detailed architecture documentation for AI-assisted development.
343+
The codebase is 23 Rust modules behind a lib crate. `CLAUDE.md` in the repo root has detailed architecture documentation for AI-assisted development.
342344

343345
## License
344346

0 commit comments

Comments
 (0)