diff --git a/CHANGELOG.md b/CHANGELOG.md index edc2718..d39c8e2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,22 @@ # Changelog +## v1.4.0 — PARA Migration (2026-03-26) + +### Added +- **PARA migration engine** (`migrate.rs`) — AI-assisted vault restructuring into Projects/Areas/Resources/Archive +- **Heuristic classification** — priority-ordered rules detect Projects (tasks, active status), Areas (recurring topics), Resources (people, reference), Archive (done, inactive) +- **Preview-then-apply workflow** — generates markdown + JSON preview for review before moving files +- **Migration rollback** — `engraph migrate para --undo` reverses the last migration +- **3 new MCP tools** — `migrate_preview`, `migrate_apply`, `migrate_undo` +- **3 new HTTP endpoints** — `POST /api/migrate/preview`, `/apply`, `/undo` +- **Migration log** — SQLite table tracks all moves for rollback support + +### Changed +- Module count: 24 → 25 +- MCP tools: 19 → 22 +- HTTP endpoints: 20 → 23 +- Test count: 385 → 417 + ## v1.3.0 — HTTP/REST Transport (2026-03-26) ### Added diff --git a/CLAUDE.md b/CLAUDE.md index 62b366b..31e96b6 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -4,7 +4,7 @@ Local knowledge graph + intelligence layer for Obsidian vaults. Rust CLI + MCP s ## Architecture -Single binary with 24 modules behind a lib crate: +Single binary with 25 modules behind a lib crate: - `config.rs` — loads `~/.engraph/config.toml` and `vault.toml`, merges CLI args, provides `data_dir()`. Includes `intelligence: Option`, `[models]` section for model overrides, `[obsidian]` section (CLI path, enabled flag), and `[agents]` section (registered AI agent names). `Config::save()` writes back to disk. - `chunker.rs` — smart chunking with break-point scoring algorithm. Finds optimal split points considering headings, code fences, blank lines, and thematic breaks. `split_oversized_chunks()` handles token-aware secondary splitting with overlap @@ -13,6 +13,7 @@ Single binary with 24 modules behind a lib crate: - `fts.rs` — FTS5 full-text search support. Re-exports `FtsResult` from store. BM25-ranked keyword search - `fusion.rs` — Reciprocal Rank Fusion (RRF) engine. Merges semantic + FTS5 + graph + reranker results. Supports per-lane weighting, `--explain` output with intent + per-lane detail - `markdown.rs` — section parser. Heading detection (ATX `#` headings with level tracking), section extraction by heading text, frontmatter splitting (YAML block between `---` fences). Powers section-level reading and editing +- `migrate.rs` — PARA migration engine. Heuristic classification of vault notes into Projects/Areas/Resources/Archive using priority-ordered rules (tasks, active status, recurring topics, people, reference, done/inactive). Preview-then-apply workflow generates markdown + JSON preview for review before moving files. Rollback support via `engraph migrate para --undo` reverses the last migration using SQLite migration log. Three MCP tools (`migrate_preview`, `migrate_apply`, `migrate_undo`) and three HTTP endpoints (`POST /api/migrate/preview`, `/apply`, `/undo`) - `obsidian.rs` — Obsidian CLI wrapper. Process detection (checks if Obsidian is running), circuit breaker state machine (Closed/Degraded/Open) for resilient CLI delegation, async subprocess execution with timeout. Falls back gracefully when Obsidian is unavailable - `health.rs` — vault health diagnostics. Orphan detection (notes with no incoming or outgoing wikilinks), broken link detection (wikilinks pointing to nonexistent notes), stale note detection (notes not modified within configurable threshold), tag hygiene (unused/rare tags). Returns structured health report - `context.rs` — context engine. Seven functions: `read` (full note content + metadata), `read_section` (targeted section extraction by heading), `list` (filtered note listing with `created_by` filter), `vault_map` (structure overview), `who` (person context bundle), `project` (project context bundle), `context_topic` (rich topic context with budget trimming). Pure functions taking `ContextParams` — no model loading except `context_topic` which reuses `search_internal` @@ -22,8 +23,8 @@ Single binary with 24 modules behind a lib crate: - `placement.rs` — folder placement engine. Uses folder centroids (online mean of embeddings per folder) to suggest the best folder for new notes. Falls back to inbox when confidence is low. Includes placement correction detection (`detect_correction_from_frontmatter`) and frontmatter stripping for moved files - `writer.rs` — write pipeline orchestrator. 5-step pipeline: resolve tags (fuzzy match + register new), discover links (exact + fuzzy), place in folder, atomic file write (temp + rename), and index update. Supports create, append, update_metadata, move_note, archive, unarchive, edit (section-level replace/prepend/append), rewrite (full content with frontmatter preservation), edit_frontmatter (granular set/remove/add_tag/remove_tag/add_alias/remove_alias ops), and delete (soft archive or hard permanent) operations with mtime-based conflict detection and crash recovery via temp file cleanup - `watcher.rs` — file watcher for `engraph serve`. OS thread producer (notify-debouncer-full, 2s debounce) sends `Vec` over tokio::mpsc to async consumer task. Two-pass batch processing: mutations (index_file/remove_file/rename_file) then edge rebuild. Move detection via content hash matching. Placement correction on file moves. Centroid adjustment on file add/remove. Startup reconciliation via `run_index_shared`. `recent_writes` map coordination with MCP server to prevent double re-indexing of files written through the write pipeline -- `serve.rs` — MCP stdio server via rmcp SDK. Exposes 19 tools: 8 read (search, read, read_section, list, vault_map, who, project, context) + 10 write (create, append, update_metadata, move_note, archive, unarchive, edit, rewrite, edit_frontmatter, delete) + 1 diagnostic (health). `edit_frontmatter` replaces `update_metadata` for granular frontmatter mutations. EngraphServer struct with Arc+Mutex wrapping for async handlers. Loads intelligence models (orchestrator + reranker) when enabled, wires into `search_with_intelligence`. Spawns file watcher on startup. CLI events table provides audit log for write operations. `recent_writes` map prevents double re-indexing of MCP-written files -- `http.rs` — axum-based HTTP REST API server, enabled via `engraph serve --http`. 20 REST endpoints mirroring all 19 MCP tools + update-metadata. API key authentication with `eg_` prefixed keys and read/write permission levels. Per-key token bucket rate limiting (configurable requests/minute). CORS with configurable allowed origins for web-based agents. `--no-auth` mode for local development (127.0.0.1 only). Graceful shutdown via `CancellationToken` coordinating MCP + HTTP + watcher exit +- `serve.rs` — MCP stdio server via rmcp SDK. Exposes 22 tools: 8 read (search, read, read_section, list, vault_map, who, project, context) + 10 write (create, append, update_metadata, move_note, archive, unarchive, edit, rewrite, edit_frontmatter, delete) + 1 diagnostic (health) + 3 migrate (migrate_preview, migrate_apply, migrate_undo). `edit_frontmatter` replaces `update_metadata` for granular frontmatter mutations. EngraphServer struct with Arc+Mutex wrapping for async handlers. Loads intelligence models (orchestrator + reranker) when enabled, wires into `search_with_intelligence`. Spawns file watcher on startup. CLI events table provides audit log for write operations. `recent_writes` map prevents double re-indexing of MCP-written files +- `http.rs` — axum-based HTTP REST API server, enabled via `engraph serve --http`. 23 REST endpoints mirroring all 22 MCP tools + update-metadata. API key authentication with `eg_` prefixed keys and read/write permission levels. Per-key token bucket rate limiting (configurable requests/minute). CORS with configurable allowed origins for web-based agents. `--no-auth` mode for local development (127.0.0.1 only). Graceful shutdown via `CancellationToken` coordinating MCP + HTTP + watcher exit - `graph.rs` — vault graph agent. Extracts wikilink targets, expands search results by following graph connections 1-2 hops. Relevance filtering via FTS5 term check and shared tags - `profile.rs` — vault profile detection. Auto-detects PARA/Folders/Flat structure, vault type (Obsidian/Logseq/Plain), wikilinks, frontmatter, tags. Content-based role detection for people/daily/archive folders by content patterns (not just names). Writes/loads `vault.toml` - `store.rs` — SQLite persistence. Tables: `meta`, `files` (with docid, created_by), `chunks` (with vector BLOBs), `chunks_fts` (FTS5), `edges` (vault graph), `tombstones`, `tag_registry`, `folder_centroids`, `placement_corrections`, `link_skiplist` (reserved), `llm_cache` (orchestrator result cache), `cli_events` (audit log for CLI operations). `vec_chunks` virtual table (sqlite-vec) for KNN search. Dynamic embedding dimension stored in meta. `has_dimension_mismatch()` and `reset_for_reindex()` for migration. Enhanced `resolve_file()` with fuzzy Levenshtein matching as final fallback @@ -31,7 +32,7 @@ Single binary with 24 modules behind a lib crate: - `temporal.rs` — temporal search lane. Extracts note dates from frontmatter `date:` field or `YYYY-MM-DD` filename patterns. Heuristic date parsing for natural language ("today", "yesterday", "last week", "this month", "recent", month names, ISO dates, date ranges). Smooth decay scoring for files near but outside target date range. Provides `extract_note_date()` for indexing and `score_temporal()` + `parse_date_range_heuristic()` for search - `search.rs` — hybrid search orchestrator. `search_with_intelligence()` runs the full pipeline: orchestrate (intent + expansions) → 5-lane RRF retrieval (semantic + FTS5 + graph + reranker + temporal) per expansion → two-pass RRF fusion. `search_internal()` is a thin wrapper without intelligence models. Adaptive lane weights per query intent including temporal (1.5 weight for time-aware queries). Results display normalized confidence percentages (0-100%) instead of raw RRF scores. -`main.rs` is a thin clap CLI (async via `#[tokio::main]`). Subcommands: `index` (with progress bar), `search` (with `--explain`, loads intelligence models when enabled), `status` (shows intelligence state + date coverage stats), `clear`, `init` (intelligence onboarding prompt, detects Obsidian CLI + AI agents), `configure` (`--enable-intelligence`, `--disable-intelligence`, `--model`, `--obsidian-cli`, `--no-obsidian-cli`, `--agent`, `--add-api-key`, `--list-api-keys`, `--revoke-api-key`), `models`, `graph` (show/stats), `context` (read/list/vault-map/who/project/topic), `write` (create/append/update-metadata/move/edit/rewrite/edit-frontmatter/delete), `serve` (MCP stdio server with file watcher + intelligence + optional `--http`/`--port`/`--host`/`--no-auth` for HTTP REST API). +`main.rs` is a thin clap CLI (async via `#[tokio::main]`). Subcommands: `index` (with progress bar), `search` (with `--explain`, loads intelligence models when enabled), `status` (shows intelligence state + date coverage stats), `clear`, `init` (intelligence onboarding prompt, detects Obsidian CLI + AI agents), `configure` (`--enable-intelligence`, `--disable-intelligence`, `--model`, `--obsidian-cli`, `--no-obsidian-cli`, `--agent`, `--add-api-key`, `--list-api-keys`, `--revoke-api-key`), `models`, `graph` (show/stats), `context` (read/list/vault-map/who/project/topic), `write` (create/append/update-metadata/move/edit/rewrite/edit-frontmatter/delete), `migrate` (para with `--preview`/`--apply`/`--undo` for PARA vault restructuring), `serve` (MCP stdio server with file watcher + intelligence + optional `--http`/`--port`/`--host`/`--no-auth` for HTTP REST API). ## Key patterns @@ -80,7 +81,7 @@ Single vault only. Re-indexing a different vault path triggers a confirmation pr ## Testing -- Unit tests in each module (`cargo test --lib`) — 385 tests, no network required +- Unit tests in each module (`cargo test --lib`) — 417 tests, no network required - Integration tests (`cargo test --test integration -- --ignored`) — require GGUF model download - Build requires CMake (for llama.cpp C++ compilation) diff --git a/Cargo.lock b/Cargo.lock index 3fe1ad7..bb880a0 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -708,6 +708,7 @@ dependencies = [ "tracing", "tracing-subscriber", "ureq", + "uuid", "zerocopy 0.7.35", ] @@ -2785,6 +2786,17 @@ version = "0.2.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "06abde3611657adf66d383f00b093d7faecc7fa57071cce2578660c9f1010821" +[[package]] +name = "uuid" +version = "1.22.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a68d3c8f01c0cfa54a75291d83601161799e4a89a39e0929f4b0354d88757a37" +dependencies = [ + "getrandom 0.4.2", + "js-sys", + "wasm-bindgen", +] + [[package]] name = "valuable" version = "0.1.1" diff --git a/Cargo.toml b/Cargo.toml index d53b0a4..e7158e6 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -41,6 +41,7 @@ shimmytok = "0.7" axum = "0.8" tower-http = { version = "0.6", features = ["cors"] } tower = "0.5" +uuid = { version = "1", features = ["v4"] } rand = "0.9" tokio-util = "0.7" diff --git a/README.md b/README.md index 5a1dc04..658277c 100644 --- a/README.md +++ b/README.md @@ -17,8 +17,8 @@ engraph turns your markdown vault into a searchable knowledge graph that AI agen Plain vector search treats your notes as isolated documents. But knowledge isn't flat — your notes link to each other, share tags, reference the same people and projects. engraph understands these connections. - **5-lane hybrid search** — semantic embeddings + BM25 full-text + graph expansion + cross-encoder reranking + temporal scoring, fused via [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf). An LLM orchestrator classifies queries and adapts lane weights per intent. Time-aware queries like "what happened last week" or "March 2026 notes" activate the temporal lane automatically. -- **MCP server for AI agents** — `engraph serve` exposes 19 tools (search, read, section-level editing, frontmatter mutations, vault health, context bundles, note creation) that Claude, Cursor, or any MCP client can call directly. -- **HTTP REST API** — `engraph serve --http` adds an axum-based HTTP server alongside MCP with 20 REST endpoints, API key authentication, rate limiting, and CORS. Web-based agents and scripts can query your vault with simple `curl` calls. +- **MCP server for AI agents** — `engraph serve` exposes 22 tools (search, read, section-level editing, frontmatter mutations, vault health, context bundles, note creation, PARA migration) that Claude, Cursor, or any MCP client can call directly. +- **HTTP REST API** — `engraph serve --http` adds an axum-based HTTP server alongside MCP with 23 REST endpoints, API key authentication, rate limiting, and CORS. Web-based agents and scripts can query your vault with simple `curl` calls. - **Section-level editing** — AI agents can read, replace, prepend, or append to specific sections by heading. Full note rewriting with frontmatter preservation. Granular frontmatter mutations (set/remove fields, add/remove tags and aliases). - **Vault health diagnostics** — detect orphan notes, broken wikilinks, stale content, and tag hygiene issues. Available as MCP tool and CLI command. - **Obsidian CLI integration** — auto-detects running Obsidian and delegates compatible operations. Circuit breaker (Closed/Degraded/Open) ensures graceful fallback. @@ -57,7 +57,7 @@ Your vault (markdown files) │ Search: Orchestrator → 4-lane retrieval │ │ → Reranker → Two-pass RRF fusion │ │ │ -│ 19 MCP tools + 20 REST endpoints │ +│ 22 MCP tools + 23 REST endpoints │ └─────────────────────────────────────────────┘ │ ▼ @@ -264,7 +264,7 @@ Returns orphan notes (no links in or out), broken wikilinks, stale notes, and ta `engraph serve --http` adds a full REST API alongside the MCP server, exposing the same capabilities over HTTP for web agents, scripts, and integrations. -**20 endpoints:** +**23 endpoints:** | Method | Endpoint | Permission | Description | |--------|----------|------------|-------------| @@ -288,6 +288,9 @@ Returns orphan notes (no links in or out), broken wikilinks, stale notes, and ta | POST | `/api/unarchive` | write | Restore archived note | | POST | `/api/update-metadata` | write | Update note metadata | | POST | `/api/delete` | write | Delete note (soft or hard) | +| POST | `/api/migrate/preview` | write | Preview PARA migration (classify + suggest moves) | +| POST | `/api/migrate/apply` | write | Apply PARA migration (move files) | +| POST | `/api/migrate/undo` | write | Undo last PARA migration | **Authentication:** @@ -335,6 +338,42 @@ key = "eg_..." permission = "write" ``` +## PARA Migration + +`engraph migrate para` restructures your vault into the [PARA method](https://fortelabs.com/blog/para/) (Projects, Areas, Resources, Archive) using heuristic classification. The workflow is non-destructive: preview first, review the plan, then apply. + +**Workflow:** + +```bash +# 1. Preview — classify notes and generate a migration plan +engraph migrate para --preview +# Outputs: markdown summary + JSON plan saved to ~/.engraph/ + +# 2. Review the plan (edit if needed) +cat ~/.engraph/migration_preview.md + +# 3. Apply — move files according to the plan +engraph migrate para --apply + +# 4. Undo — reverse the last migration if something looks wrong +engraph migrate para --undo +``` + +**Classification signals:** + +| Category | Detection signals | +|----------|-------------------| +| **Projects** | Open tasks (`- [ ]`), active/in-progress status in frontmatter, project tags | +| **Areas** | Recurring topic keywords (health, finance, career, learning), area-related tags | +| **Resources** | People notes (People folder, person-like content), reference material, articles, code snippets | +| **Archive** | Done/completed/inactive status, no incoming or outgoing wikilinks, stale content | + +Notes that don't match any signal with sufficient confidence stay in place. Daily notes (`YYYY-MM-DD.md`) and templates are always skipped. + +**MCP tools:** `migrate_preview`, `migrate_apply`, `migrate_undo` — available in `engraph serve` for AI-assisted migration. + +**HTTP endpoints:** `POST /api/migrate/preview`, `/api/migrate/apply`, `/api/migrate/undo` — available via `engraph serve --http`. + ## Use cases **AI-assisted knowledge work** — Give Claude or Cursor deep access to your personal knowledge base. Instead of copy-pasting context, the agent searches, reads, and cross-references your notes directly. @@ -352,7 +391,7 @@ permission = "write" | Search method | 5-lane RRF (semantic + BM25 + graph + reranker + temporal) | Vector similarity only | Keyword only | | Query understanding | LLM orchestrator classifies intent, adapts weights | None | None | | Understands note links | Yes (wikilink graph traversal) | No | Limited (backlinks panel) | -| AI agent access | MCP server (19 tools) + HTTP REST API (20 endpoints) | Custom API needed | No | +| AI agent access | MCP server (22 tools) + HTTP REST API (23 endpoints) | Custom API needed | No | | Write capability | Create/edit/rewrite/delete with smart filing | No | Manual | | Vault health | Orphans, broken links, stale notes, tag hygiene | No | Limited | | Real-time sync | File watcher, 2s debounce | Manual re-index | N/A | @@ -369,8 +408,8 @@ engraph is not a replacement for Obsidian — it's the intelligence layer that s - LLM research orchestrator: query intent classification + query expansion + adaptive lane weights - llama.cpp inference via Rust bindings (GGUF models, Metal GPU on macOS, CUDA on Linux) - Intelligence opt-in: heuristic fallback when disabled, LLM-powered when enabled -- MCP server with 19 tools (8 read, 10 write, 1 diagnostic) via stdio -- HTTP REST API with 20 endpoints, API key auth (`eg_` prefix), rate limiting, CORS — enabled via `engraph serve --http` +- MCP server with 22 tools (8 read, 10 write, 1 diagnostic, 3 migrate) via stdio +- HTTP REST API with 23 endpoints, API key auth (`eg_` prefix), rate limiting, CORS — enabled via `engraph serve --http` - Section-level reading and editing: target specific headings with replace/prepend/append modes - Full note rewriting with automatic frontmatter preservation - Granular frontmatter mutations: set/remove fields, add/remove tags and aliases @@ -384,8 +423,9 @@ engraph is not a replacement for Obsidian — it's the intelligence layer that s - Placement correction learning from user file moves - Enhanced file resolution with fuzzy Levenshtein matching fallback - Content-based folder role detection (people, daily, archive) by content patterns +- PARA migration: AI-assisted vault restructuring into Projects/Areas/Resources/Archive with preview, apply, and undo workflow - Configurable model overrides for multilingual support -- 385 unit tests, CI on macOS + Ubuntu +- 417 unit tests, CI on macOS + Ubuntu ## Roadmap @@ -396,7 +436,8 @@ engraph is not a replacement for Obsidian — it's the intelligence layer that s - [x] ~~Obsidian CLI integration — auto-detect and delegate with circuit breaker~~ (v1.1) - [x] ~~Temporal search — find notes by time period, date-aware queries~~ (v1.2) - [x] ~~HTTP/REST API — complement MCP with a standard web API~~ (v1.3) -- [ ] Multi-vault — search across multiple vaults (v1.4) +- [x] ~~PARA migration — AI-assisted vault restructuring with preview/apply/undo~~ (v1.4) +- [ ] Multi-vault — search across multiple vaults (v1.5) ## Configuration @@ -430,7 +471,7 @@ All data stored in `~/.engraph/` — single SQLite database (~10MB typical), GGU ## Development ```bash -cargo test --lib # 385 unit tests, no network (requires CMake for llama.cpp) +cargo test --lib # 417 unit tests, no network (requires CMake for llama.cpp) cargo clippy -- -D warnings cargo fmt --check @@ -442,7 +483,7 @@ cargo test --test integration -- --ignored Contributions welcome. Please open an issue first to discuss what you'd like to change. -The codebase is 24 Rust modules behind a lib crate. `CLAUDE.md` in the repo root has detailed architecture documentation for AI-assisted development. +The codebase is 25 Rust modules behind a lib crate. `CLAUDE.md` in the repo root has detailed architecture documentation for AI-assisted development. ## License diff --git a/src/http.rs b/src/http.rs index 7fce4b6..b92bd04 100644 --- a/src/http.rs +++ b/src/http.rs @@ -376,6 +376,10 @@ pub fn build_router(state: ApiState) -> Router { .route("/api/unarchive", post(handle_unarchive)) .route("/api/update-metadata", post(handle_update_metadata)) .route("/api/delete", post(handle_delete)) + // Migration endpoints + .route("/api/migrate/preview", post(handle_migrate_preview)) + .route("/api/migrate/apply", post(handle_migrate_apply)) + .route("/api/migrate/undo", post(handle_migrate_undo)) .layer(cors) .with_state(state) } @@ -834,6 +838,52 @@ async fn handle_update_metadata( Ok(Json(serde_json::json!(result))) } +// --------------------------------------------------------------------------- +// Migration endpoint handlers +// --------------------------------------------------------------------------- + +async fn handle_migrate_preview( + State(state): State, + headers: HeaderMap, +) -> Result { + authorize(&headers, &state, true)?; + let store = state.store.lock().await; + let profile_ref = state.profile.as_ref().as_ref(); + let preview = crate::migrate::generate_preview(&store, &state.vault_path, profile_ref) + .map_err(|e| ApiError::internal(&format!("{e:#}")))?; + Ok(Json(serde_json::to_value(&preview).unwrap())) +} + +#[derive(Deserialize)] +struct MigrateApplyBody { + preview: serde_json::Value, +} + +async fn handle_migrate_apply( + State(state): State, + headers: HeaderMap, + Json(body): Json, +) -> Result { + authorize(&headers, &state, true)?; + let store = state.store.lock().await; + let preview: crate::migrate::MigrationPreview = serde_json::from_value(body.preview) + .map_err(|e| ApiError::bad_request(&format!("Invalid preview: {e}")))?; + let result = crate::migrate::apply_preview(&preview, &store, &state.vault_path) + .map_err(|e| ApiError::internal(&format!("{e:#}")))?; + Ok(Json(serde_json::to_value(&result).unwrap())) +} + +async fn handle_migrate_undo( + State(state): State, + headers: HeaderMap, +) -> Result { + authorize(&headers, &state, true)?; + let store = state.store.lock().await; + let result = crate::migrate::undo_last(&store, &state.vault_path) + .map_err(|e| ApiError::internal(&format!("{e:#}")))?; + Ok(Json(serde_json::to_value(&result).unwrap())) +} + async fn handle_delete( State(state): State, headers: HeaderMap, diff --git a/src/lib.rs b/src/lib.rs index 4dc4d67..f2a24b0 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -11,6 +11,7 @@ pub mod indexer; pub mod links; pub mod llm; pub mod markdown; +pub mod migrate; pub mod obsidian; pub mod placement; pub mod profile; diff --git a/src/main.rs b/src/main.rs index a7e3928..89f513d 100644 --- a/src/main.rs +++ b/src/main.rs @@ -158,6 +158,12 @@ enum Command { #[command(subcommand)] action: WriteAction, }, + + /// Migrate vault structure. + Migrate { + #[command(subcommand)] + action: MigrateAction, + }, } #[derive(Subcommand, Debug)] @@ -307,6 +313,19 @@ enum ModelsAction { Info { name: String }, } +#[derive(Subcommand, Debug)] +enum MigrateAction { + /// Classify notes and generate PARA migration preview. + Para { + /// Apply a previously generated preview. + #[arg(long)] + apply: bool, + /// Undo the last migration. + #[arg(long, conflicts_with = "apply")] + undo: bool, + }, +} + /// Prompt user to enable intelligence, download models if yes. fn prompt_intelligence(data_dir: &std::path::Path) -> Result { eprint!( @@ -1421,6 +1440,73 @@ async fn main() -> Result<()> { } } + Command::Migrate { action } => { + let data_dir = Config::data_dir()?; + if !index_exists(&data_dir) { + eprintln!("No index found. Run 'engraph index ' first."); + std::process::exit(1); + } + let db_path = data_dir.join("engraph.db"); + let store = store::Store::open(&db_path)?; + let vault_path_str = store + .get_meta("vault_path")? + .expect("no vault path in index"); + let vault_path = PathBuf::from(&vault_path_str); + let profile = Config::load_vault_profile().ok().flatten(); + + match action { + MigrateAction::Para { apply, undo } => { + if undo { + let result = engraph::migrate::undo_last(&store, &vault_path)?; + println!( + "Migration {} undone: {} files restored", + result.migration_id, result.restored + ); + if !result.errors.is_empty() { + eprintln!("Errors:"); + for e in &result.errors { + eprintln!(" {}", e); + } + } + } else if apply { + let preview = engraph::migrate::load_preview(&data_dir)?; + let result = + engraph::migrate::apply_preview(&preview, &store, &vault_path)?; + println!( + "Migration {} applied: {} files moved", + result.migration_id, result.moved + ); + if !result.errors.is_empty() { + eprintln!("Errors:"); + for e in &result.errors { + eprintln!(" {}", e); + } + } + } else { + // Generate preview + println!("Scanning vault for PARA classification..."); + let preview = engraph::migrate::generate_preview( + &store, + &vault_path, + profile.as_ref(), + )?; + engraph::migrate::save_preview(&preview, &data_dir)?; + println!(); + println!("Preview generated:"); + println!(" Files to move: {}", preview.files.len()); + println!(" Uncertain: {}", preview.uncertain.len()); + println!(" Skipped: {}", preview.skipped); + println!(); + println!("Preview saved to:"); + println!(" {}", data_dir.join("migration-preview.md").display()); + println!(" {}", data_dir.join("migration-preview.json").display()); + println!(); + println!("Review the preview, then run: engraph migrate para --apply"); + } + } + } + } + Command::Models { action } => { let defaults = engraph::llm::ModelDefaults::default(); match action { diff --git a/src/migrate.rs b/src/migrate.rs new file mode 100644 index 0000000..b210e63 --- /dev/null +++ b/src/migrate.rs @@ -0,0 +1,924 @@ +//! Heuristic PARA classification engine for vault migration. +//! +//! Classifies notes into PARA categories (Project, Area, Resource, Archive) +//! using priority-ordered heuristic rules, generates migration previews, +//! and formats them as markdown for user review. + +use std::path::Path; + +use anyhow::Result; +use serde::{Deserialize, Serialize}; +use time::OffsetDateTime; + +use crate::markdown::split_frontmatter; +use crate::profile::VaultProfile; +use crate::store::Store; + +// ── Core types ───────────────────────────────────────────────── + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub enum Category { + Project, + Area, + Resource, + Archive, + Skip, + Uncertain, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct Classification { + pub category: Category, + pub confidence: f64, + pub signal: String, + pub suggested_path: Option, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct FileClassification { + pub path: String, + pub classification: Classification, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct MigrationPreview { + pub migration_id: String, + pub files: Vec, + pub uncertain: Vec, + pub skipped: usize, +} + +#[derive(Debug, Serialize)] +pub struct MigrationResult { + pub migration_id: String, + pub moved: usize, + pub skipped: usize, + pub errors: Vec, +} + +#[derive(Debug, Serialize)] +pub struct UndoResult { + pub migration_id: String, + pub restored: usize, + pub errors: Vec, +} + +// ── Heuristic classifier ─────────────────────────────────────── + +/// Classify a note using heuristic rules only (no LLM). +/// Rules run in priority order — first match wins. +/// +/// Parameters: +/// - content: full note content +/// - filename: relative path (e.g., "07-Daily/2026-03-26.md") +/// - frontmatter_str: raw frontmatter YAML (without --- delimiters), or None +/// - edge_count: incoming + outgoing edges from the store +/// - has_recent_mentions: whether the note was mentioned in notes from the last 30 days +pub fn classify_heuristic( + content: &str, + filename: &str, + frontmatter_str: Option<&str>, + edge_count: usize, + has_recent_mentions: bool, +) -> Classification { + // Extract basename (without extension) for pattern matching + let basename = std::path::Path::new(filename) + .file_stem() + .and_then(|s| s.to_str()) + .unwrap_or(""); + + // Rule 1: Daily note — basename matches YYYY-MM-DD pattern + if is_daily_note(basename) { + return Classification { + category: Category::Skip, + confidence: 1.0, + signal: "daily note filename pattern".into(), + suggested_path: None, + }; + } + + // Rule 2: Template — path contains "template" (case-insensitive) + if filename.to_lowercase().contains("template") { + return Classification { + category: Category::Skip, + confidence: 1.0, + signal: "template path".into(), + suggested_path: None, + }; + } + + // Rule 3: Canvas — filename ends with .canvas + if filename.ends_with(".canvas") { + return Classification { + category: Category::Skip, + confidence: 1.0, + signal: "canvas file".into(), + suggested_path: None, + }; + } + + let fm = frontmatter_str.unwrap_or(""); + + // Rule 4: Status active/in-progress → Project (90%) + if fm.contains("status: active") || fm.contains("status: in-progress") { + return Classification { + category: Category::Project, + confidence: 0.9, + signal: "frontmatter status active/in-progress".into(), + suggested_path: Some("01-Projects/".into()), + }; + } + + // Rule 5: Unchecked tasks → Project (80%) + if content.contains("- [ ]") { + return Classification { + category: Category::Project, + confidence: 0.8, + signal: "unchecked tasks found".into(), + suggested_path: Some("01-Projects/".into()), + }; + } + + // Rule 6: Status done/completed → Archive (85%) + if fm.contains("status: done") || fm.contains("status: completed") { + return Classification { + category: Category::Archive, + confidence: 0.85, + signal: "frontmatter status done/completed".into(), + suggested_path: Some("04-Archive/".into()), + }; + } + + // Rule 7: Person tag → Resource (90%) + if fm.contains("- person") || fm.contains("- people") { + return Classification { + category: Category::Resource, + confidence: 0.9, + signal: "person/people tag in frontmatter".into(), + suggested_path: Some("03-Resources/People/".into()), + }; + } + + // Rule 8: No edges + no recent mentions → Archive (75%) + if edge_count == 0 && !has_recent_mentions { + return Classification { + category: Category::Archive, + confidence: 0.75, + signal: "no edges and no recent mentions".into(), + suggested_path: Some("04-Archive/".into()), + }; + } + + // Rule 9: High edges + no tasks → Resource (70%) + if edge_count >= 3 && !content.contains("- [ ]") { + return Classification { + category: Category::Resource, + confidence: 0.7, + signal: "high edge count with no open tasks".into(), + suggested_path: Some("03-Resources/".into()), + }; + } + + // Rule 10: Area keywords in filename or first 200 chars of content + let area_keywords = [ + "health", + "finance", + "career", + "learning", + "fitness", + "nutrition", + "budget", + ]; + let filename_lower = filename.to_lowercase(); + let content_prefix: String = content.chars().take(200).collect::().to_lowercase(); + for keyword in &area_keywords { + if filename_lower.contains(keyword) || content_prefix.contains(keyword) { + return Classification { + category: Category::Area, + confidence: 0.6, + signal: format!("area keyword '{keyword}' found"), + suggested_path: Some("02-Areas/".into()), + }; + } + } + + // Rule 11: Nothing matched → Uncertain + Classification { + category: Category::Uncertain, + confidence: 0.0, + signal: "no heuristic rules matched".into(), + suggested_path: None, + } +} + +/// Check if a basename matches the YYYY-MM-DD date pattern. +fn is_daily_note(basename: &str) -> bool { + let bytes = basename.as_bytes(); + if bytes.len() != 10 { + return false; + } + // Check format: DDDD-DD-DD where D is digit + bytes[4] == b'-' + && bytes[7] == b'-' + && bytes[0..4].iter().all(|b| b.is_ascii_digit()) + && bytes[5..7].iter().all(|b| b.is_ascii_digit()) + && bytes[8..10].iter().all(|b| b.is_ascii_digit()) +} + +// ── Path suggestion ─────────────────────────────────────────── + +/// Suggest a PARA-compliant destination path for a classified note. +/// +/// Uses the `VaultProfile` folder mappings if available, otherwise falls +/// back to standard PARA folder names. Returns the current path unchanged +/// if the category is Skip/Uncertain, or if the file is already under the +/// correct PARA folder. +fn suggest_path(current_path: &str, category: &Category, profile: Option<&VaultProfile>) -> String { + let basename = std::path::Path::new(current_path) + .file_name() + .and_then(|s| s.to_str()) + .unwrap_or(current_path); + + let folder = match category { + Category::Project => profile + .and_then(|p| p.structure.folders.projects.as_deref()) + .unwrap_or("01-Projects"), + Category::Area => profile + .and_then(|p| p.structure.folders.areas.as_deref()) + .unwrap_or("02-Areas"), + Category::Resource => profile + .and_then(|p| p.structure.folders.resources.as_deref()) + .unwrap_or("03-Resources"), + Category::Archive => profile + .and_then(|p| p.structure.folders.archive.as_deref()) + .unwrap_or("04-Archive"), + _ => return current_path.to_string(), // Skip/Uncertain don't move + }; + + let trimmed = folder.trim_end_matches('/'); + + // If the file is already under the target folder, keep it where it is. + if current_path.starts_with(&format!("{}/", trimmed)) + || current_path.starts_with(&format!("{}/", folder)) + { + return current_path.to_string(); + } + + format!("{}/{}", trimmed, basename) +} + +// ── Preview generation ──────────────────────────────────────── + +/// Generate a migration preview by classifying all indexed files. +/// +/// Reads file content from disk, runs heuristic classification, computes +/// suggested paths, and partitions results into confident moves vs +/// uncertain notes that need manual review. +pub fn generate_preview( + store: &Store, + vault_path: &Path, + profile: Option<&VaultProfile>, +) -> Result { + let migration_id = uuid::Uuid::new_v4().to_string(); + let all_files = store.get_all_files()?; + let mut files = Vec::new(); + let mut uncertain = Vec::new(); + let mut skipped = 0; + + let thirty_days_ago = std::time::SystemTime::now() + .duration_since(std::time::UNIX_EPOCH) + .unwrap_or_default() + .as_secs() as i64 + - 30 * 86400; + + for file in &all_files { + let full_path = vault_path.join(&file.path); + let content = match std::fs::read_to_string(&full_path) { + Ok(c) => c, + Err(_) => { + skipped += 1; + continue; // skip unreadable files + } + }; + + let (fm, _body) = split_frontmatter(&content); + let edge_count = store.edge_count_for_file(file.id).unwrap_or(0); + + // A note is "recently active" if it has a note_date within 30 days or has edges. + let has_recent = file + .note_date + .map(|d| d >= thirty_days_ago) + .unwrap_or(false) + || edge_count > 0; + + let mut classification = + classify_heuristic(&content, &file.path, fm.as_deref(), edge_count, has_recent); + + if classification.category == Category::Skip { + skipped += 1; + continue; + } + + // Compute suggested path. + let suggested = suggest_path(&file.path, &classification.category, profile); + + // If the file is already in the right place, skip it. + if suggested == file.path { + skipped += 1; + continue; + } + + classification.suggested_path = Some(suggested); + + let fc = FileClassification { + path: file.path.clone(), + classification, + }; + + if fc.classification.category == Category::Uncertain { + uncertain.push(fc); + } else { + files.push(fc); + } + } + + // Sort by confidence descending. + files.sort_by(|a, b| { + b.classification + .confidence + .partial_cmp(&a.classification.confidence) + .unwrap_or(std::cmp::Ordering::Equal) + }); + + Ok(MigrationPreview { + migration_id, + files, + uncertain, + skipped, + }) +} + +// ── Markdown formatting ─────────────────────────────────────── + +/// Extract the filename (last path component) from a path string. +fn basename(path: &str) -> &str { + std::path::Path::new(path) + .file_name() + .and_then(|s| s.to_str()) + .unwrap_or(path) +} + +/// Extract the parent folder from a path string, or "-" if at root. +fn folder(path: &str) -> &str { + std::path::Path::new(path) + .parent() + .and_then(|p| p.to_str()) + .filter(|s| !s.is_empty()) + .unwrap_or("-") +} + +/// Format a `MigrationPreview` as a markdown document for user review. +/// +/// Groups files by category in tables showing current path, proposed path, +/// confidence, and the heuristic signal that triggered the classification. +pub fn format_preview_markdown(preview: &MigrationPreview) -> String { + let now = OffsetDateTime::now_utc(); + let date_str = format!( + "{:04}-{:02}-{:02}", + now.year(), + now.month() as u8, + now.day() + ); + + let mut out = String::new(); + out.push_str(&format!( + "# PARA Migration Preview\n\n\ + Generated: {} | Files to move: {} | Uncertain: {} | Skipped: {}\n\n", + date_str, + preview.files.len(), + preview.uncertain.len(), + preview.skipped, + )); + + // Group files by category. + for category in &[ + Category::Project, + Category::Area, + Category::Resource, + Category::Archive, + ] { + let cat_files: Vec<_> = preview + .files + .iter() + .filter(|f| f.classification.category == *category) + .collect(); + if cat_files.is_empty() { + continue; + } + + out.push_str(&format!( + "## {:?} ({} files)\n\n", + category, + cat_files.len() + )); + out.push_str("| File | Current | Proposed | Confidence | Signal |\n"); + out.push_str("|------|---------|----------|------------|--------|\n"); + for f in &cat_files { + out.push_str(&format!( + "| {} | {} | {} | {:.0}% | {} |\n", + basename(&f.path), + folder(&f.path), + f.classification.suggested_path.as_deref().unwrap_or("?"), + f.classification.confidence * 100.0, + f.classification.signal, + )); + } + out.push('\n'); + } + + if !preview.uncertain.is_empty() { + out.push_str(&format!( + "## Uncertain ({} files)\n\n", + preview.uncertain.len() + )); + out.push_str("| File | Current | Best Guess | Signal |\n"); + out.push_str("|------|---------|------------|--------|\n"); + for f in &preview.uncertain { + out.push_str(&format!( + "| {} | {} | ? | {} |\n", + basename(&f.path), + folder(&f.path), + f.classification.signal, + )); + } + out.push('\n'); + } + + out +} + +// ── Apply / Undo / Persistence ──────────────────────────────── + +/// Execute a migration preview: move each file to its suggested path. +/// +/// Skips files with no suggested path or that are already in the correct +/// location. Logs each successful move to the store's migration log so it +/// can be undone later. +pub fn apply_preview( + preview: &MigrationPreview, + store: &Store, + vault_path: &Path, +) -> Result { + let mut moved = 0; + let mut errors = Vec::new(); + + for fc in &preview.files { + let target = match &fc.classification.suggested_path { + Some(p) => p, + None => continue, + }; + // Skip if already in correct location + if fc.path == *target { + continue; + } + + // Extract target folder from the suggested path + let folder = std::path::Path::new(target) + .parent() + .and_then(|p| p.to_str()) + .unwrap_or(""); + + match crate::writer::move_note(&fc.path, folder, store, vault_path) { + Ok(_) => { + store.log_migration( + &preview.migration_id, + &fc.path, + target, + &format!("{:?}", fc.classification.category), + fc.classification.confidence, + )?; + moved += 1; + } + Err(e) => errors.push(format!("{}: {e:#}", fc.path)), + } + } + + Ok(MigrationResult { + migration_id: preview.migration_id.clone(), + moved, + skipped: errors.len(), + errors, + }) +} + +/// Rollback the most recent migration by moving files back to their +/// original locations and deleting the migration log entries. +pub fn undo_last(store: &Store, vault_path: &Path) -> Result { + let migration_id = store + .get_last_migration_id()? + .ok_or_else(|| anyhow::anyhow!("No migration to undo"))?; + let entries = store.get_migration(&migration_id)?; + + let mut restored = 0; + let mut errors = Vec::new(); + + // Reverse order to undo correctly + for entry in entries.iter().rev() { + let old_folder = std::path::Path::new(&entry.old_path) + .parent() + .and_then(|p| p.to_str()) + .filter(|s| !s.is_empty()) + .unwrap_or("."); + match crate::writer::move_note(&entry.new_path, old_folder, store, vault_path) { + Ok(_) => restored += 1, + Err(e) => errors.push(format!("{}: {e:#}", entry.new_path)), + } + } + + store.delete_migration(&migration_id)?; + + Ok(UndoResult { + migration_id, + restored, + errors, + }) +} + +/// Write a migration preview to disk as both JSON and markdown files. +pub fn save_preview(preview: &MigrationPreview, data_dir: &Path) -> Result<()> { + let json = serde_json::to_string_pretty(preview)?; + std::fs::write(data_dir.join("migration-preview.json"), json)?; + let md = format_preview_markdown(preview); + std::fs::write(data_dir.join("migration-preview.md"), md)?; + Ok(()) +} + +/// Load a previously saved migration preview from disk. +pub fn load_preview(data_dir: &Path) -> Result { + let json = std::fs::read_to_string(data_dir.join("migration-preview.json"))?; + let preview: MigrationPreview = serde_json::from_str(&json)?; + Ok(preview) +} + +// ── Tests ────────────────────────────────────────────────────── + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_classify_project_by_status() { + let c = classify_heuristic( + "---\nstatus: active\n---\n# Sprint 6\n", + "sprint-6.md", + Some("status: active"), + 5, + true, + ); + assert_eq!(c.category, Category::Project); + assert!(c.confidence >= 0.9); + } + + #[test] + fn test_classify_project_by_tasks() { + let c = classify_heuristic( + "# Todo\n- [ ] Fix bug\n- [x] Done\n", + "todo.md", + None, + 2, + true, + ); + assert_eq!(c.category, Category::Project); + assert!(c.confidence >= 0.8); + } + + #[test] + fn test_classify_archive_by_status() { + let c = classify_heuristic( + "---\nstatus: done\n---\n# Old\n", + "old.md", + Some("status: done"), + 0, + false, + ); + assert_eq!(c.category, Category::Archive); + } + + #[test] + fn test_classify_resource_person() { + let c = classify_heuristic( + "---\ntags:\n - person\n---\n# John\n", + "john.md", + Some("tags:\n - person"), + 3, + true, + ); + assert_eq!(c.category, Category::Resource); + } + + #[test] + fn test_classify_area_keywords() { + let c = classify_heuristic( + "# Health\n\nTreadmill training\n", + "health.md", + None, + 2, + true, + ); + assert_eq!(c.category, Category::Area); + } + + #[test] + fn test_skip_daily_note() { + let c = classify_heuristic("# Daily\n", "2026-03-26.md", None, 0, true); + assert_eq!(c.category, Category::Skip); + } + + #[test] + fn test_skip_daily_note_in_folder() { + let c = classify_heuristic("# Daily\n", "07-Daily/2026-03-26.md", None, 0, true); + assert_eq!(c.category, Category::Skip); + } + + #[test] + fn test_classify_archive_no_edges() { + let c = classify_heuristic("# Random\nSome content\n", "random.md", None, 0, false); + assert_eq!(c.category, Category::Archive); + } + + #[test] + fn test_uncertain_when_ambiguous() { + // Has edges and recent mentions, but no tasks, no status, no person tag, no area keywords. + // edge_count=2 avoids Rule 9 (high edges >= 3 → Resource). + let c = classify_heuristic( + "# Meeting notes\nDiscussed roadmap\n", + "meeting.md", + None, + 2, + true, + ); + assert_eq!(c.category, Category::Uncertain); + } + + #[test] + fn test_skip_template() { + let c = classify_heuristic("# Template\n", "05-Templates/Daily Note.md", None, 0, false); + assert_eq!(c.category, Category::Skip); + } + + // ── suggest_path tests ──────────────────────────────────── + + #[test] + fn test_suggest_path_project() { + let path = suggest_path("random/sprint.md", &Category::Project, None); + assert_eq!(path, "01-Projects/sprint.md"); + } + + #[test] + fn test_suggest_path_area() { + let path = suggest_path("misc/health.md", &Category::Area, None); + assert_eq!(path, "02-Areas/health.md"); + } + + #[test] + fn test_suggest_path_resource() { + let path = suggest_path("notes/article.md", &Category::Resource, None); + assert_eq!(path, "03-Resources/article.md"); + } + + #[test] + fn test_suggest_path_archive() { + let path = suggest_path("old/done.md", &Category::Archive, None); + assert_eq!(path, "04-Archive/done.md"); + } + + #[test] + fn test_suggest_path_already_correct() { + let path = suggest_path("01-Projects/sprint.md", &Category::Project, None); + assert_eq!(path, "01-Projects/sprint.md"); + } + + #[test] + fn test_suggest_path_skip_unchanged() { + let path = suggest_path("some/note.md", &Category::Skip, None); + assert_eq!(path, "some/note.md"); + } + + #[test] + fn test_suggest_path_uncertain_unchanged() { + let path = suggest_path("some/note.md", &Category::Uncertain, None); + assert_eq!(path, "some/note.md"); + } + + #[test] + fn test_suggest_path_with_profile() { + use crate::profile::*; + let profile = VaultProfile { + vault_path: std::path::PathBuf::from("/test"), + vault_type: VaultType::Obsidian, + structure: StructureDetection { + method: StructureMethod::Para, + folders: FolderMap { + inbox: None, + projects: Some("Projects".into()), + areas: Some("Areas".into()), + resources: Some("Resources".into()), + archive: Some("Archive".into()), + templates: None, + daily: None, + people: None, + }, + }, + stats: VaultStats::default(), + }; + let path = suggest_path("random/sprint.md", &Category::Project, Some(&profile)); + assert_eq!(path, "Projects/sprint.md"); + } + + #[test] + fn test_suggest_path_root_file() { + let path = suggest_path("todo.md", &Category::Project, None); + assert_eq!(path, "01-Projects/todo.md"); + } + + // ── basename / folder tests ─────────────────────────────── + + #[test] + fn test_basename_simple() { + assert_eq!(basename("01-Projects/sprint.md"), "sprint.md"); + } + + #[test] + fn test_basename_root() { + assert_eq!(basename("note.md"), "note.md"); + } + + #[test] + fn test_folder_nested() { + assert_eq!(folder("01-Projects/Work/sprint.md"), "01-Projects/Work"); + } + + #[test] + fn test_folder_root() { + assert_eq!(folder("note.md"), "-"); + } + + // ── format_preview_markdown tests ───────────────────────── + + #[test] + fn test_format_preview_markdown_structure() { + let preview = MigrationPreview { + migration_id: "test".into(), + files: vec![FileClassification { + path: "todo.md".into(), + classification: Classification { + category: Category::Project, + confidence: 0.8, + signal: "has tasks".into(), + suggested_path: Some("01-Projects/todo.md".into()), + }, + }], + uncertain: vec![], + skipped: 2, + }; + let md = format_preview_markdown(&preview); + assert!(md.contains("# PARA Migration Preview")); + assert!(md.contains("Project (1 files)")); + assert!(md.contains("todo.md")); + assert!(md.contains("80%")); + assert!(md.contains("has tasks")); + assert!(md.contains("Skipped: 2")); + } + + #[test] + fn test_format_preview_markdown_multiple_categories() { + let preview = MigrationPreview { + migration_id: "test2".into(), + files: vec![ + FileClassification { + path: "sprint.md".into(), + classification: Classification { + category: Category::Project, + confidence: 0.9, + signal: "status active".into(), + suggested_path: Some("01-Projects/sprint.md".into()), + }, + }, + FileClassification { + path: "old/done.md".into(), + classification: Classification { + category: Category::Archive, + confidence: 0.85, + signal: "status done".into(), + suggested_path: Some("04-Archive/done.md".into()), + }, + }, + ], + uncertain: vec![FileClassification { + path: "mystery.md".into(), + classification: Classification { + category: Category::Uncertain, + confidence: 0.0, + signal: "no heuristic rules matched".into(), + suggested_path: None, + }, + }], + skipped: 5, + }; + let md = format_preview_markdown(&preview); + assert!(md.contains("Project (1 files)")); + assert!(md.contains("Archive (1 files)")); + assert!(md.contains("Uncertain (1 files)")); + assert!(md.contains("Files to move: 2")); + assert!(md.contains("Uncertain: 1")); + assert!(md.contains("Skipped: 5")); + } + + #[test] + fn test_format_preview_markdown_empty() { + let preview = MigrationPreview { + migration_id: "empty".into(), + files: vec![], + uncertain: vec![], + skipped: 10, + }; + let md = format_preview_markdown(&preview); + assert!(md.contains("# PARA Migration Preview")); + assert!(md.contains("Files to move: 0")); + assert!(md.contains("Skipped: 10")); + // Should NOT contain any category section headers. + assert!(!md.contains("## Project")); + assert!(!md.contains("## Uncertain")); + } + + // ── apply / undo / save+load tests ──────────────────────── + + #[test] + fn test_apply_and_undo_roundtrip() { + let tmp = tempfile::tempdir().unwrap(); + let root = tmp.path().to_path_buf(); + let store = crate::store::Store::open_memory().unwrap(); + + // Create directory structure + std::fs::create_dir_all(root.join("01-Projects")).unwrap(); + + // Create a file at root level + std::fs::write(root.join("todo.md"), "# Todo\n- [ ] task\n").unwrap(); + store + .insert_file("todo.md", "hash1", 100, &[], "tod123", None, None) + .unwrap(); + + // Build a preview manually + let preview = MigrationPreview { + migration_id: "test-mig-001".into(), + files: vec![FileClassification { + path: "todo.md".into(), + classification: Classification { + category: Category::Project, + confidence: 0.8, + signal: "has tasks".into(), + suggested_path: Some("01-Projects/todo.md".into()), + }, + }], + uncertain: vec![], + skipped: 0, + }; + + // Apply + let result = apply_preview(&preview, &store, &root).unwrap(); + assert_eq!(result.moved, 1); + assert!(result.errors.is_empty()); + assert!(!root.join("todo.md").exists()); + assert!(root.join("01-Projects/todo.md").exists()); + + // Undo + let undo = undo_last(&store, &root).unwrap(); + assert_eq!(undo.restored, 1); + assert!(undo.errors.is_empty()); + assert!(root.join("todo.md").exists()); + assert!(!root.join("01-Projects/todo.md").exists()); + } + + #[test] + fn test_undo_no_migration() { + let store = crate::store::Store::open_memory().unwrap(); + let tmp = tempfile::tempdir().unwrap(); + let result = undo_last(&store, tmp.path()); + assert!(result.is_err()); + } + + #[test] + fn test_save_and_load_preview() { + let tmp = tempfile::tempdir().unwrap(); + let preview = MigrationPreview { + migration_id: "test-001".into(), + files: vec![], + uncertain: vec![], + skipped: 5, + }; + save_preview(&preview, tmp.path()).unwrap(); + let loaded = load_preview(tmp.path()).unwrap(); + assert_eq!(loaded.migration_id, "test-001"); + assert_eq!(loaded.skipped, 5); + } +} diff --git a/src/serve.rs b/src/serve.rs index 9e0a9fd..f82c344 100644 --- a/src/serve.rs +++ b/src/serve.rs @@ -134,6 +134,18 @@ pub struct ReadSectionParams { #[derive(Debug, Deserialize, JsonSchema)] pub struct HealthParams {} +#[derive(Debug, Deserialize, JsonSchema)] +pub struct MigratePreviewParams {} + +#[derive(Debug, Deserialize, JsonSchema)] +pub struct MigrateApplyParams { + /// Migration preview JSON (from migrate_preview). + pub preview: serde_json::Value, +} + +#[derive(Debug, Deserialize, JsonSchema)] +pub struct MigrateUndoParams {} + #[derive(Debug, Deserialize, JsonSchema)] pub struct EditParams { /// Target note: file path, basename, or #docid. @@ -685,6 +697,51 @@ impl EngraphServer { to_json_result(&result) } + #[tool( + name = "migrate_preview", + description = "Generate PARA migration preview. Classifies all notes into Projects/Areas/Resources/Archive and returns proposed moves with confidence scores." + )] + async fn migrate_preview( + &self, + _params: Parameters, + ) -> Result { + let store = self.store.lock().await; + let profile_ref = self.profile.as_ref().as_ref(); + let preview = crate::migrate::generate_preview(&store, &self.vault_path, profile_ref) + .map_err(|e| mcp_err(&e))?; + to_json_result(&preview) + } + + #[tool( + name = "migrate_apply", + description = "Apply a PARA migration preview. Moves files to their classified PARA locations. Reversible via migrate_undo." + )] + async fn migrate_apply( + &self, + params: Parameters, + ) -> Result { + let store = self.store.lock().await; + let preview: crate::migrate::MigrationPreview = serde_json::from_value(params.0.preview) + .map_err(|e| mcp_err(&anyhow::anyhow!("Invalid preview JSON: {e}")))?; + let result = crate::migrate::apply_preview(&preview, &store, &self.vault_path) + .map_err(|e| mcp_err(&e))?; + to_json_result(&result) + } + + #[tool( + name = "migrate_undo", + description = "Undo the most recent PARA migration, restoring all moved files to their original locations." + )] + async fn migrate_undo( + &self, + _params: Parameters, + ) -> Result { + let store = self.store.lock().await; + let result = + crate::migrate::undo_last(&store, &self.vault_path).map_err(|e| mcp_err(&e))?; + to_json_result(&result) + } + #[tool( name = "delete", description = "Delete a note. Soft mode (default) moves it to the archive folder. Hard mode permanently removes it from disk and index." @@ -725,7 +782,8 @@ impl rmcp::handler::server::ServerHandler for EngraphServer { Read: vault_map to orient, search to find, read/read_section for content, who/project for context bundles, health for vault diagnostics. \ Write: create for new notes, append to add content, edit to modify a section, rewrite to replace body, \ edit_frontmatter for tags/properties, update_metadata for bulk tag/alias replacement. \ - Lifecycle: move_note to relocate, archive to soft-delete, unarchive to restore, delete for permanent removal.", + Lifecycle: move_note to relocate, archive to soft-delete, unarchive to restore, delete for permanent removal. \ + Migration: migrate_preview to classify notes into PARA folders, migrate_apply to execute the migration, migrate_undo to revert.", ) } } diff --git a/src/store.rs b/src/store.rs index 3eea4f0..42423ab 100644 --- a/src/store.rs +++ b/src/store.rs @@ -47,6 +47,18 @@ pub struct EdgeStats { pub isolated_file_count: usize, } +/// A record of a PARA migration operation (batch file moves). +#[derive(Debug, Clone)] +pub struct MigrationEntry { + pub id: i64, + pub migration_id: String, + pub old_path: String, + pub new_path: String, + pub category: String, + pub confidence: f64, + pub migrated_at: String, +} + /// A record representing a CLI event (for observability/analytics). #[derive(Debug, Clone)] pub struct CliEvent { @@ -322,6 +334,20 @@ impl Store { CREATE INDEX IF NOT EXISTS idx_unresolved_source ON unresolved_links(source_file);", )?; + // Migration log table — records PARA migration batch operations. + self.conn.execute_batch( + "CREATE TABLE IF NOT EXISTS migration_log ( + id INTEGER PRIMARY KEY, + migration_id TEXT NOT NULL, + old_path TEXT NOT NULL, + new_path TEXT NOT NULL, + category TEXT NOT NULL, + confidence REAL NOT NULL, + migrated_at TEXT NOT NULL DEFAULT (datetime('now')) + ); + CREATE INDEX IF NOT EXISTS idx_migration_id ON migration_log(migration_id);", + )?; + Ok(()) } @@ -1504,6 +1530,68 @@ impl Store { Ok(results) } + // ── Migration Log ──────────────────────────────────────────── + + /// Record a single file move as part of a named migration batch. + pub fn log_migration( + &self, + migration_id: &str, + old_path: &str, + new_path: &str, + category: &str, + confidence: f64, + ) -> Result<()> { + self.conn.execute( + "INSERT INTO migration_log (migration_id, old_path, new_path, category, confidence) + VALUES (?1, ?2, ?3, ?4, ?5)", + params![migration_id, old_path, new_path, category, confidence], + )?; + Ok(()) + } + + /// Retrieve all entries for a migration, ordered by insertion order. + pub fn get_migration(&self, migration_id: &str) -> Result> { + let mut stmt = self.conn.prepare( + "SELECT id, migration_id, old_path, new_path, category, confidence, migrated_at + FROM migration_log WHERE migration_id = ?1 ORDER BY id ASC", + )?; + let rows = stmt.query_map(params![migration_id], |row| { + Ok(MigrationEntry { + id: row.get(0)?, + migration_id: row.get(1)?, + old_path: row.get(2)?, + new_path: row.get(3)?, + category: row.get(4)?, + confidence: row.get(5)?, + migrated_at: row.get(6)?, + }) + })?; + let results: Result, _> = rows.collect(); + Ok(results?) + } + + /// Return the migration_id of the most recently created migration, if any. + pub fn get_last_migration_id(&self) -> Result> { + let result = self + .conn + .query_row( + "SELECT migration_id FROM migration_log ORDER BY migrated_at DESC, id DESC LIMIT 1", + [], + |row| row.get::<_, String>(0), + ) + .optional()?; + Ok(result) + } + + /// Delete all entries for a migration (for undo / rollback support). + pub fn delete_migration(&self, migration_id: &str) -> Result<()> { + self.conn.execute( + "DELETE FROM migration_log WHERE migration_id = ?1", + params![migration_id], + )?; + Ok(()) + } + // ── Helpers ───────────────────────────────────────────────── pub fn next_vector_id(&self) -> Result { @@ -3310,4 +3398,53 @@ mod tests { .unwrap(); assert_eq!(store.count_files_with_dates().unwrap(), 2); } + + #[test] + fn test_migration_log_insert_and_query() { + let store = Store::open_memory().unwrap(); + store + .log_migration( + "mig-001", + "old/note.md", + "01-Projects/note.md", + "project", + 0.9, + ) + .unwrap(); + store + .log_migration( + "mig-001", + "old/ref.md", + "03-Resources/ref.md", + "resource", + 0.85, + ) + .unwrap(); + let entries = store.get_migration("mig-001").unwrap(); + assert_eq!(entries.len(), 2); + assert_eq!(entries[0].old_path, "old/note.md"); + } + + #[test] + fn test_migration_log_get_last() { + let store = Store::open_memory().unwrap(); + store + .log_migration("mig-001", "a.md", "01-Projects/a.md", "project", 0.9) + .unwrap(); + store + .log_migration("mig-002", "b.md", "02-Areas/b.md", "area", 0.8) + .unwrap(); + let last_id = store.get_last_migration_id().unwrap(); + assert_eq!(last_id.as_deref(), Some("mig-002")); + } + + #[test] + fn test_migration_log_delete() { + let store = Store::open_memory().unwrap(); + store + .log_migration("mig-001", "a.md", "01-Projects/a.md", "project", 0.9) + .unwrap(); + store.delete_migration("mig-001").unwrap(); + assert!(store.get_migration("mig-001").unwrap().is_empty()); + } }