diff --git a/CHANGELOG.md b/CHANGELOG.md
index c34fcd9..e22afb0 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -24,7 +24,7 @@ PUBLISHING PROCEDURE:
5. After publishing, the next PR author will add a new "## Unreleased" section
-->
-## Unreleased
+## 0.6.1 (2026-05-20)
### Changed
diff --git a/Cargo.lock b/Cargo.lock
index 4187e88..d96d956 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -5097,7 +5097,7 @@ dependencies = [
[[package]]
name = "monodex"
-version = "0.6.0"
+version = "0.6.1"
dependencies = [
"anyhow",
"arrow-array",
diff --git a/Cargo.toml b/Cargo.toml
index 9ceac75..781f257 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -1,6 +1,6 @@
[package]
name = "monodex"
-version = "0.6.0"
+version = "0.6.1"
edition = "2024"
rust-version = "1.93"
description = "Fast, accurate code search for large Rush monorepos"
diff --git a/README.md b/README.md
index 21f517d..cfdb281 100644
--- a/README.md
+++ b/README.md
@@ -361,7 +361,7 @@ monodex dump-chunks --file ./src/JsonFile.ts --debug
monodex dump-chunks --file ./src/JsonFile.ts --target-size 4000
# Audit chunking quality across multiple files (AST-only mode)
-monodex audit-chunks --count 20 --dir /path/to/project
+monodex audit-chunks --count 20 --folder /path/to/project
```
**Chunk Quality Score**: 0-100%, higher is better. Scores below 95% may indicate chunking issues. Note: `dump-chunks` and `audit-chunks` use AST-only mode (fallback disabled) to accurately measure partitioner quality.
@@ -460,7 +460,7 @@ RUST_LOG=debug ./target/release/monodex crawl --catalog sparo --label main --com
The crawl behavior (which files to index and how to chunk them) can be customized via configuration files.
-For the full inventory of files Monodex reads or writes (config-folder state, the database directory layout, repo-local config files), see [docs/design/monodex_files.md](https://github.com/microsoft/monodex/blob/main/docs/design/monodex_files.md).
+For the full inventory of files Monodex reads or writes (config-folder state, the database folder layout, repo-local config files), see [docs/design/monodex_files.md](https://github.com/microsoft/monodex/blob/main/docs/design/monodex_files.md).
### Config Discovery
@@ -474,7 +474,7 @@ No merging occurs. Exactly one config is used.
### Config Schema
-JSON schemas are available in the `schemas/` directory for IDE autocomplete and validation. Reference the appropriate schema in your config file via the `$schema` field:
+JSON schemas are available in the `schemas/` folder for IDE autocomplete and validation. Reference the appropriate schema in your config file via the `$schema` field:
| Config File | Schema File |
| ------------------------- | ----------------------------- |
@@ -530,12 +530,12 @@ shouldCrawl = matchesFileType && (matchesPatternsToKeep || !matchesPatternsToExc
- `fileTypes` is the primary filter. Unsupported file types are never crawled.
- `patternsToKeep` overrides `patternsToExclude` (useful for keeping test files in `src/`)
-- Directory patterns (ending in `/`) match anywhere in the path
+- Folder patterns (ending in `/`) match anywhere in the path
**Pattern syntax:**
- Glob patterns use the standard syntax: `**` for recursive, `*` for wildcard
-- Directory patterns end with `/` (e.g., `node_modules/`)
+- Folder patterns end with `/` (e.g., `node_modules/`)
- Example: `**/*.test.ts` matches test files at any depth
## Status
diff --git a/docs/backlog.md b/docs/backlog.md
index aeff9f8..3486408 100644
--- a/docs/backlog.md
+++ b/docs/backlog.md
@@ -44,7 +44,7 @@ For official feature requests, create a GitHub issue. If an issue needs higher p
-**BL51 `monodex init` command, with `examples/` rename.** Generate `/monodex-config.json`, `/monodex-crawl-config.json`, and `/monodex-state.json` from the templates currently under `examples/`, with `$schema` URLs set to the published locations. Removes a setup step for new users. Implementation: `include_bytes!` to embed templates at compile time, plus a small command handler with the standard "file already exists" handling. Depends on the templates being embedded (trivial) and ideally on schema publication (otherwise `$schema` URLs are placeholders). The directory should be renamed from `examples/` to `config-templates/` as part of this work, since the current name is a misnomer.
+**BL51 `monodex init` command, with `examples/` rename.** Generate `/monodex-config.json`, `/monodex-crawl-config.json`, and `/monodex-state.json` from the templates currently under `examples/`, with `$schema` URLs set to the published locations. Removes a setup step for new users. Implementation: `include_bytes!` to embed templates at compile time, plus a small command handler with the standard "file already exists" handling. Depends on the templates being embedded (trivial) and ideally on schema publication (otherwise `$schema` URLs are placeholders). The folder should be renamed from `examples/` to `config-templates/` as part of this work, since the current name is a misnomer.
(severity=feature, work=small)
@@ -90,7 +90,7 @@ Items with at least one non-obvious insight worth recording, but no commitment t
-**BL52 Orphan reclamation garbage collection.** Three orphan kinds, swept by one `monodex gc` command: chunk-row orphans (rows in `chunks` with `active_label_ids = []`, typically from interrupted crawls; reclaimed by deleting the row), vector-payload orphans (non-NULL `vector` on a row no in-selection vector method points at; reclaimed by setting `vector = NULL`, row stays), Tantivy-directory orphans (a directory under `/fts///` for a label whose selection no longer includes FTS, or no longer exists; reclaimed by deleting the directory). All three share the same conceptual structure (content unreferenced by any in-selection label state) and operational constraint (requires the database to be quiescent for a full scan). One feature, offline command, not continuous background work. Workaround until the verb exists: `purge` and rebuild from scratch. Revisit once databases live long enough that orphan accumulation matters in practice. Implementation note: an internal `null_vectors_for_row_ids` primitive already exists, which nulls vector columns while preserving the rows. It may be the right mechanism for vector-only invalidation or orphan cleanup.
+**BL52 Orphan reclamation garbage collection.** Three orphan kinds, swept by one `monodex gc` command: chunk-row orphans (rows in `chunks` with `active_label_ids = []`, typically from interrupted crawls; reclaimed by deleting the row), vector-payload orphans (non-NULL `vector` on a row no in-selection vector method points at; reclaimed by setting `vector = NULL`, row stays), Tantivy-folder orphans (a folder under `/fts///` for a label whose selection no longer includes FTS, or no longer exists; reclaimed by deleting the folder). All three share the same conceptual structure (content unreferenced by any in-selection label state) and operational constraint (requires the database to be quiescent for a full scan). One feature, offline command, not continuous background work. Workaround until the verb exists: `purge` and rebuild from scratch. Revisit once databases live long enough that orphan accumulation matters in practice. Implementation note: an internal `null_vectors_for_row_ids` primitive already exists, which nulls vector columns while preserving the rows. It may be the right mechanism for vector-only invalidation or orphan cleanup.
(severity=feature, work=large)
@@ -126,7 +126,7 @@ Items with at least one non-obvious insight worth recording, but no commitment t
-**BL104 Batch the per-row writes in `remove_label_from_chunks`.** The label-reassignment cleanup phase at the end of every successful crawl does one LanceDB write per orphaned chunk (`src/engine/storage/chunks/storage.rs:706` and `:710`) while holding the commit mutex for the whole loop. Fine for typical crawls; pathological for large refactors (directory renames, package moves, mass file deletions) where orphans run into the thousands and the held mutex blocks other writers for the duration. The work splits cleanly: bulk-delete rows that go to zero `active_label_ids` via `delete` with a `row_id IN (...)` predicate in `UPSERT_BATCH_SIZE` chunks; apply non-empty label-list shrinks via `merge_insert` (LanceDB's `update` is per-predicate, not vectorized over different per-row values, so `merge_insert` is the natural batched primitive). Adjacent to but distinct from BL52 (orphan GC): BL52 reclaims rows whose `active_label_ids` is already empty; this gets them to empty more efficiently.
+**BL104 Batch the per-row writes in `remove_label_from_chunks`.** The label-reassignment cleanup phase at the end of every successful crawl does one LanceDB write per orphaned chunk (`src/engine/storage/chunks/storage.rs:706` and `:710`) while holding the commit mutex for the whole loop. Fine for typical crawls; pathological for large refactors (folder renames, package moves, mass file deletions) where orphans run into the thousands and the held mutex blocks other writers for the duration. The work splits cleanly: bulk-delete rows that go to zero `active_label_ids` via `delete` with a `row_id IN (...)` predicate in `UPSERT_BATCH_SIZE` chunks; apply non-empty label-list shrinks via `merge_insert` (LanceDB's `update` is per-predicate, not vectorized over different per-row values, so `merge_insert` is the natural batched primitive). Adjacent to but distinct from BL52 (orphan GC): BL52 reclaims rows whose `active_label_ids` is already empty; this gets them to empty more efficiently.
(severity=performance, work=medium)
@@ -150,7 +150,7 @@ Items with at least one non-obvious insight worth recording, but no commitment t
-**BL68 Orphaned per-catalog lockfile cleanup command.** Per-catalog lockfiles get created lazily and never deleted; the lockfile directory grows monotonically as catalogs come and go. Bounded and tiny per the design's framing in `concurrency.md:134` and `:168`, but a real loose end with no current owner. A future maintenance command can sweep orphaned per-catalog lockfiles for catalogs no longer in `monodex-config.json`.
+**BL68 Orphaned per-catalog lockfile cleanup command.** Per-catalog lockfiles get created lazily and never deleted; the lockfile folder grows monotonically as catalogs come and go. Bounded and tiny per the design's framing in `concurrency.md:134` and `:168`, but a real loose end with no current owner. A future maintenance command can sweep orphaned per-catalog lockfiles for catalogs no longer in `monodex-config.json`.
(severity=hygiene, work=small)
diff --git a/docs/code_organization_policy.md b/docs/code_organization_policy.md
index 44dcf6c..19b96c4 100644
--- a/docs/code_organization_policy.md
+++ b/docs/code_organization_policy.md
@@ -145,14 +145,29 @@ There is no fixed threshold. The judgment is relative: tag the largest contribut
## Naming
+### File and directory names
+
- Command handlers: named after the CLI subcommand (`purge.rs`, `search.rs`). Use `use_cmd.rs` for `use` (reserved keyword).
- Engine submodule directories: named after the concept (`partitioner/`, `storage/`).
- Type-only files: `types.rs` or `models.rs`.
- Test files: `tests.rs` (singular).
+- No semantically vapid filenames. `utilities.rs`, `helpers.rs`, `common.rs`, `misc.rs` are free to write and tell the next reader nothing; half the codebase is "utilities" of some sort. The work of naming is finding what the functions actually have in common, and that shared trait is usually a better name: `formatting.rs` if the trait is formatting, `test_mocks.rs` or `test_fixtures.rs` if the trait is test setup. `test_helpers.rs` is acceptable only when no narrower trait is visible. Pick the narrowest accurate name today; rename when contents change.
+
+### Folder vs directory
+
+Prefer "folder" in identifiers, prose, doc comments, error messages, and clap help text.
+
+The cases that stay "directory":
+
+- **Established compounds**, in their established meaning: "working directory" when it means the Git enlistment folder or `std::env::current_dir()`; "current directory" / "current working directory" for `std::env::current_dir()`; "root directory" for a filesystem root; "home directory" for `$HOME`.
+- **Vendor and standard-library API surface**: type names, trait names, function names, error variants, and terms-of-art from third-party documentation. `std::fs::read_dir`, `std::fs::create_dir_all`, Tantivy's `Directory` trait, `MmapDirectory`, `OpenDirectoryError`, LanceDB's "directory-based table format" all stay as-is; renaming them would prevent readers from finding the underlying documentation.
+
+The two cases above identify objects that keep the word "directory". Prose specifically describing one of those objects inherits the word, so the sentence agrees with the symbol it names. A doc comment on `MmapDirectory::open` says "opens the directory at the given path"; a sentence about a function called `parse_working_directory_arg` says "parses the working directory argument." This is a derived rule, not a third independent criterion.
+
+Counter-examples: a loop variable iterating folders is `current_folder`, not `current_directory` (the first rule requires the literal `current_dir()` meaning). A test fixture holding a `TempDir` may keep `_tmp_dir`; it mirrors the crate's type, not a Monodex concept.
## Banned patterns
-- No semantically vapid filenames. `utilities.rs`, `helpers.rs`, `common.rs`, `misc.rs` are free to write and tell the next reader nothing; half the codebase is "utilities" of some sort. The work of naming is finding what the functions actually have in common, and that shared trait is usually a better name: `formatting.rs` if the trait is formatting, `test_mocks.rs` or `test_fixtures.rs` if the trait is test setup. `test_helpers.rs` is acceptable only when no narrower trait is visible. Pick the narrowest accurate name today; rename when contents change.
- No wildcard re-exports (`pub use submodule::*`). List re-exports explicitly.
- No putting unrelated items together just because they're small.
- No structural splits in the same change as feature or fix work. Splits are their own change unless explicitly authorized by the maintainer or the planned reorganization being applied.
diff --git a/docs/design/architecture.md b/docs/design/architecture.md
index aa8553c..832f832 100644
--- a/docs/design/architecture.md
+++ b/docs/design/architecture.md
@@ -12,14 +12,14 @@ The README introduces the database/catalog/label/chunk hierarchy. Three refineme
## Data model
-The database is a directory containing two LanceDB tables and a tree of per-label Tantivy index directories. The on-disk layout (the `.lance/` table convention, `monodex-meta.json`, the schema-versioning rule, the FTS directory tree) is in [monodex_files.md](./monodex_files.md).
+The database is a folder containing two LanceDB tables and a tree of per-label Tantivy index folders. The on-disk layout (the `.lance/` table convention, `monodex-meta.json`, the schema-versioning rule, the FTS folder tree) is in [monodex_files.md](./monodex_files.md).
The two LanceDB tables:
- `chunks`: one row per indexed chunk. Carries the chunk text, embedding vector (nullable, so a chunk row can exist without an embedding when only FTS is in selection), identity fields, path/package context, and label membership. Schema in `src/engine/schema.rs`; the typed Rust row struct (`ChunkRow`) in `src/engine/storage/rows.rs`.
- `label_metadata`: one row per label. Carries the catalog, the bare label name, the qualified `label_id`, the source kind, and per-retrieval-method state: a nullable source column (commit OID, working-directory sentinel, or NULL when the method is out of the label's retrieval selection) and a completion boolean per method. The set of in-selection methods for a label is derived from which method-source columns are non-NULL.
-FTS state lives outside the LanceDB tables, in a per-label Tantivy index directory at `/fts///`. Each label gets its own Tantivy index because BM25 statistics are computed per-corpus at index time; sharing one Tantivy index across labels would mix statistics from chunks that don't belong to the queried label. The per-label structure makes label-scoped FTS queries correct by construction.
+FTS state lives outside the LanceDB tables, in a per-label Tantivy index folder at `/fts///`. Each label gets its own Tantivy index because BM25 statistics are computed per-corpus at index time; sharing one Tantivy index across labels would mix statistics from chunks that don't belong to the queried label. The per-label structure makes label-scoped FTS queries correct by construction.
The chunk row carries enough path/package/breadcrumb context to be displayed without a working tree or Git checkout. Search results stand alone.
@@ -56,7 +56,7 @@ A crawl run, end to end:
1. **Label upsert**: Resolve `--commit` to a full SHA (or note that this is a `--working-dir` run); update the label's retrieval selection from `--retrieval` (set per-method `source` columns to the resolved commit, NULL out methods being dropped) and mark each in-selection method's `complete` flag false.
2. **Tree visitor**: Enumerate files from the commit tree or from the working-directory blob map (`git ls-files` + `git status`).
-3. **Package indexing**: Build the package index, a map from directory paths to package names, by reading every `package.json` in the source (for commit mode, all `package.json` files in the commit tree; for working-directory mode, all `package.json` files in Git's working-tree view).
+3. **Package indexing**: Build the package index, a map from folder paths to package names, by reading every `package.json` in the source (for commit mode, all `package.json` files in the commit tree; for working-directory mode, all `package.json` files in Git's working-tree view).
4. **File processing**: For each file: compute `file_id`, check the sentinel, and either skip-with-label-add or read-chunk-embed-upsert.
5. **Label reassignment**: After all files succeed, scan chunks tagged with this label, drop the label from any whose `file_id` wasn't touched, and delete chunks whose `active_label_ids` becomes empty.
6. **FTS phase**: If `fts` is in the new retrieval selection, batch-reconcile the per-label Tantivy index against the label's current chunks. Derive the currently indexed set from Tantivy's term dictionary, apply additions and removals, commit once. Schema/tokenizer ID mismatch on existing FTS state triggers a per-label rebuild.
@@ -96,7 +96,7 @@ Monodex reads and writes files in three places: the user's config folder (`~/.mo
## Source tree
-Module-organization rules — file size targets, where new code goes, banned patterns — are in the [code organization policy](../code_organization_policy.md). What follows is a one-or-two-line description of every non-test source file. Pure export-only `mod.rs` files are omitted; `mod.rs` files with substantive implementation are listed. Section headings are repo-relative directory paths, so `cli.rs` under `### src/app/` lives at `src/app/cli.rs`.
+Module-organization rules — file size targets, where new code goes, banned patterns — are in the [code organization policy](../code_organization_policy.md). What follows is a one-or-two-line description of every non-test source file. Pure export-only `mod.rs` files are omitted; `mod.rs` files with substantive implementation are listed. Section headings are repo-relative folder paths, so `cli.rs` under `### src/app/` lives at `src/app/cli.rs`.
### src/
@@ -122,7 +122,7 @@ Application-layer code, CLI-specific. Not reusable as a library.
CLI subcommand handlers, each in its own file or subdirectory. Most are thin: parse args, call into the engine, format output.
-- `audit_chunks.rs`: `audit-chunks`: sample TypeScript files from a directory and report aggregate chunk-quality scores. AST-only mode.
+- `audit_chunks.rs`: `audit-chunks`: sample TypeScript files from a folder and report aggregate chunk-quality scores. AST-only mode.
- `crawl.rs`: `crawl`: enumerate files (commit tree or working dir), drive the embed/upload pipeline, run label reassignment after success.
- `debug_fts.rs`: `debug-fts`: print tokens for a chunk and optionally explain query ranking. Diagnostic for FTS tokenization issues.
- `dump_chunks.rs`: `dump-chunks`: visualize partitioner output for a single file. Supports debug, visualize, and with-fallback modes.
@@ -168,14 +168,14 @@ Reusable indexing engine. Does not depend on `src/app/`.
### src/engine/fts/
-Tantivy-based full-text search. Per-label index directories under `/fts///`. See [concurrency.md](./concurrency.md) for the writer contract and [monodex_files.md](./monodex_files.md) for the on-disk layout.
+Tantivy-based full-text search. Per-label index folders under `/fts///`. See [concurrency.md](./concurrency.md) for the writer contract and [monodex_files.md](./monodex_files.md) for the on-disk layout.
Keep the direct `tantivy` dependency aligned with the version resolved through LanceDB. After dependency changes, `cargo tree -i tantivy` should show a single Tantivy version; two side-by-side versions in the dep graph mean `tantivy::Index` from our crate and from LanceDB's are different types, with real binary-size cost.
-- `error.rs`: Helpers for typed discrimination of Tantivy NotFound-style errors. Used by FTS read paths to normalize directory disappearance to absent-index outcomes (`open_existing` returns `FtsOpenExistingOutcome::NoIndex`, `fts_search` returns `FtsSearchOutcome::NoIndex`) instead of propagating raw IO errors.
-- `index.rs`: Open and create per-label Tantivy indexes. Owns the `FtsIndex` handle and the heap-budget constant. Write paths use `open_or_create`; read paths use `open_existing` so a missing FTS directory has no mkdir side effect.
+- `error.rs`: Helpers for typed discrimination of Tantivy NotFound-style errors. Used by FTS read paths to normalize folder disappearance to absent-index outcomes (`open_existing` returns `FtsOpenExistingOutcome::NoIndex`, `fts_search` returns `FtsSearchOutcome::NoIndex`) instead of propagating raw IO errors.
+- `index.rs`: Open and create per-label Tantivy indexes. Owns the `FtsIndex` handle and the heap-budget constant. Write paths use `open_or_create`; read paths use `open_existing` so a missing FTS folder has no mkdir side effect.
- `indexing.rs`: `index_chunks_for_fts` and `FtsIndexingStats`. Reads the label's chunks from LanceDB, derives the currently indexed set from Tantivy's term dictionary, applies additions and removals, commits once, writes the manifest. See [crawl.md](./crawl.md) for the indexing flow.
-- `manifest.rs`: Per-label FTS compatibility metadata at `/fts///manifest.json`. Stores `FTS_SCHEMA_ID` and `FTS_TOKENIZER_ID` for stale-index detection after upgrade. Read result is the typed `ManifestRead` enum (`Missing`, `Present`, `IdMismatch`, `Unreadable`); the four cases dispatch differently.
+- `manifest.rs`: Per-label FTS compatibility metadata at `/fts///manifest.json`. Stores `FTS_SCHEMA_ID` and `FTS_TOKENIZER_ID` for stale-index detection after upgrade. Read result is the typed `ManifestRead` enum (`Missing`, `Present`, `IdMismatch`, `Unreadable`); the four cases dispatch differently.
- `schema.rs`: Tantivy schema for the FTS index (`row_id` as `STRING | STORED` for hit hydration; `text` as `TEXT` not stored). Distinct from `engine/schema.rs` (the LanceDB Arrow schema for the chunks/labels tables).
- `search.rs`: `fts_search` and the `FtsHit` / `FtsSearchOutcome` result types (`Found`, `NoIndex`, `Stale`, `ParseError`). `Stale` carries an `FtsStaleReason` indicating why the index cannot be queried safely; the app layer emits a warning and skips Tantivy. Builds a Tantivy query parser bound to the monodex tokenizer.
- `tokenizer.rs`: Custom tokenizer for source code. Splits on case transitions, underscores, dots, digit boundaries, ASCII whitespace and punctuation; keeps both the original token and the splits; the upper-to-lower transition keeps the last uppercase letter with the following word (`HTTPServer` → `httpserver`, `http`, `server`). Jieba word-segmentation for CJK runs, loaded once per process via `OnceLock`.
@@ -204,7 +204,7 @@ TypeScript/TSX AST-based chunking. See [chunker.md](./chunker.md) for the algori
LanceDB storage layer. Typed operations on the two tables.
-- `database.rs`: Open a database directory, validate `monodex-meta.json` schema version, expose table handles. Single source of database-open errors.
+- `database.rs`: Open a database folder, validate `monodex-meta.json` schema version, expose table handles. Single source of database-open errors.
- `labels.rs`: Read, upsert, and delete `label_metadata` rows. Handles the per-method retrieval-selection and completion lifecycle.
- `locks.rs`: OS-level file-locking primitives (database, catalog, commit mutex) backing the writer-lock taxonomy. Watchdog thread for long-acquisition progress reporting. See [concurrency.md](./concurrency.md).
- `predicate.rs`: LanceDB SQL predicate builders (`eq_str`, `in_quoted_strs`, etc.) used across the storage layer. Callers must pre-validate inputs: catalog names by `validate_catalog`, label IDs by `LabelId::parse`.
@@ -233,6 +233,6 @@ Every `.md` file in the repo, with a one-line description. Add an entry when add
- [`docs/design/crawl.md`](./crawl.md): Crawl pipeline in detail: package index implementation, working-directory identity model, label reassignment, FTS phase reconciliation, per-file vector-presence invariant.
- [`docs/design/search.md`](./search.md): Search-side behavior: retrieval methods, decision rules, RRF fusion, tokenizer, output format, debug-fts.
- [`docs/design/chunker.md`](./chunker.md): Chunking algorithms: TypeScript AST partitioning (the "two worlds model"), markdown splitting, quality markers and scoring.
-- [`docs/design/concurrency.md`](./concurrency.md): Writer lock taxonomy (database, catalog, commit mutex), reader-lock-free contract, interaction with LanceDB MVCC and Tantivy's per-directory locks.
+- [`docs/design/concurrency.md`](./concurrency.md): Writer lock taxonomy (database, catalog, commit mutex), reader-lock-free contract, interaction with LanceDB MVCC and Tantivy's per-folder locks.
- [`docs/design/monodex_files.md`](./monodex_files.md): Inventory of files monodex reads or writes: config-folder state, repo-local config files monodex reads from the indexed repo, editor-consumed schemas, init templates.
- [`schemas/editing.md`](../../schemas/editing.md): Cross-reference back to the Rust structs that mirror these schemas, plus a policy reminder that these files are publicly published artifacts.
diff --git a/docs/design/chunker.md b/docs/design/chunker.md
index 1cf996c..be86c99 100644
--- a/docs/design/chunker.md
+++ b/docs/design/chunker.md
@@ -114,7 +114,7 @@ The final score is `100 * (1 - count_badness)^α * (1 - micro_badness)^β` with
Two CLI commands exist for chunker development:
- **`monodex dump-chunks --file `.** Runs the partitioner on a single file and prints the resulting chunks with sizes, breadcrumbs, and quality markers. Supports `--debug` for verbose split-decision logging, `--visualize` for full chunk contents, `--with-fallback` to enable line-based fallback (off by default in this command), and `--target-size` to override the default 6000-character target.
-- **`monodex audit-chunks --count --dir `.** Samples N TypeScript files from a directory, runs the partitioner on each (AST-only mode, no fallback), and reports aggregate quality scores. Useful for measuring the effect of partitioner changes across a real codebase.
+- **`monodex audit-chunks --count --folder `.** Samples N TypeScript files from a folder, runs the partitioner on each (AST-only mode, no fallback), and reports aggregate quality scores. Useful for measuring the effect of partitioner changes across a real codebase.
Both commands run the partitioner without writing to the database, so they're safe to use during development.
diff --git a/docs/design/concurrency.md b/docs/design/concurrency.md
index b3b976d..f94fdc0 100644
--- a/docs/design/concurrency.md
+++ b/docs/design/concurrency.md
@@ -6,7 +6,7 @@ The core property: **writers are serialized; readers are lock-free.** Two writer
## Why writers need coordination
-A Monodex database holds two LanceDB tables (`chunks`, `label_metadata`) and a per-label tree of Tantivy index directories under `fts/`. The chunks table is a single physical dataset shared across all catalogs; isolation between catalogs is logical, enforced by `catalog == X` predicates on every read and write. Tantivy's directories are per-catalog and per-label, physically separate.
+A Monodex database holds two LanceDB tables (`chunks`, `label_metadata`) and a per-label tree of Tantivy index folders under `fts/`. The chunks table is a single physical dataset shared across all catalogs; isolation between catalogs is logical, enforced by `catalog == X` predicates on every read and write. Tantivy's folders are per-catalog and per-label, physically separate.
Three failure modes motivate the lock design:
@@ -43,7 +43,7 @@ The three primitives compose. A typical writer holds shared(database) + exclusiv
Acquisition order is always database -> catalog -> commit mutex. Release is the reverse order, governed by Rust's drop order. No operation reaches past a level it doesn't need to acquire.
-`use` writes `~/.monodex/monodex-state.json` in the user's config folder, not the database directory, so it does not interact with the lock taxonomy at all.
+`use` writes `~/.monodex/monodex-state.json` in the user's config folder, not the database folder, so it does not interact with the lock taxonomy at all.
## Contention behavior
@@ -76,7 +76,7 @@ This is the correct UX answer in the absence of a global write barrier. The sear
A reader during `purge --all` sees pre-purge data until the purge commits, then post-purge data. Because purge is not a multi-table transaction, a reader could in principle observe a state where chunks have been truncated but `label_metadata` has not (or vice versa). Current `search` and `view` paths do not consult `label_metadata` for retrieval, so this gap is not user-visible today; it would become visible if those paths gain a per-method completion consultation step in the future.
-A reader querying FTS state during a concurrent `purge --catalog X` may encounter directory-disappearance errors as the purge unlinks the `fts//` tree. The search-path implementation routes these through the "absent state" warning (yellow message naming the catalog and the crawl command that would rebuild) rather than surfacing a raw IO error. The exact failure point varies by platform: on POSIX the unlink succeeds and the reader's open mappings stay valid against the unreferenced inodes (it just sees pre-purge data until close), while on Windows the purger's `DeleteFile` against an mmapped segment fails with a sharing violation and the deletion stalls until the reader releases. Either is a defensible "user is destroying data they're querying" outcome; neither needs additional locking on the reader side.
+A reader querying FTS state during a concurrent `purge --catalog X` may encounter folder-disappearance errors as the purge unlinks the `fts//` tree. The search-path implementation routes these through the "absent state" warning (yellow message naming the catalog and the crawl command that would rebuild) rather than surfacing a raw IO error. The exact failure point varies by platform: on POSIX the unlink succeeds and the reader's open mappings stay valid against the unreferenced inodes (it just sees pre-purge data until close), while on Windows the purger's `DeleteFile` against an mmapped segment fails with a sharing violation and the deletion stalls until the reader releases. Either is a defensible "user is destroying data they're querying" outcome; neither needs additional locking on the reader side.
## Storage-layer integration
@@ -88,11 +88,11 @@ The methods covered by this rule include every LanceDB-mutating storage operatio
Operations that report a row count (such as `delete_by_catalog`'s "deleted N chunks" message) compute the count from rows matching the operation's predicate, under the same commit-mutex acquisition that performs the delete. Counting total table rows before and after the delete and subtracting would race against concurrent writes to other catalogs and report the wrong number; predicate-scoped counts under the mutex are accurate.
-Tantivy writes follow a different shape. A Tantivy `IndexWriter` is held for the duration of an FTS phase, accumulates document additions and deletions in memory, and commits once at the end. There is no per-write contention point analogous to LanceDB's `merge_insert`. The protection Tantivy's writes need is provided by the per-catalog lock that the surrounding crawl already holds: no other Monodex writer can have an `IndexWriter` open against the same catalog's directories.
+Tantivy writes follow a different shape. A Tantivy `IndexWriter` is held for the duration of an FTS phase, accumulates document additions and deletions in memory, and commits once at the end. There is no per-write contention point analogous to LanceDB's `merge_insert`. The protection Tantivy's writes need is provided by the per-catalog lock that the surrounding crawl already holds: no other Monodex writer can have an `IndexWriter` open against the same catalog's folders.
-Tantivy's own per-directory lockfile (the `INDEX_WRITER_LOCK` it acquires when an `IndexWriter` opens) is redundant under our discipline. It guards Tantivy state from same-directory concurrent writers within Tantivy's own model, but that scenario can't occur if our per-catalog lock is held: only one Monodex process is in the FTS phase for catalog X at a time. Redundant but harmless: we don't disable it, and we accept that a panic mid-FTS can leave a Tantivy lockfile that needs manual cleanup before the next crawl proceeds. Stale-lockfile recovery is a Tantivy concern, not ours.
+Tantivy's own per-folder lockfile (the `INDEX_WRITER_LOCK` it acquires when an `IndexWriter` opens) is redundant under our discipline. It guards Tantivy state from same-folder concurrent writers within Tantivy's own model, but that scenario can't occur if our per-catalog lock is held: only one Monodex process is in the FTS phase for catalog X at a time. Redundant but harmless: we don't disable it, and we accept that a panic mid-FTS can leave a Tantivy lockfile that needs manual cleanup before the next crawl proceeds. Stale-lockfile recovery is a Tantivy concern, not ours.
-The asymmetry between LanceDB's per-write commit mutex and Tantivy's per-phase locking reflects the asymmetry in their physical layouts. LanceDB has one shared dataset across all catalogs; Tantivy has per-catalog directories. The commit mutex is for cross-catalog physical contention on the shared dataset, and Tantivy doesn't need an analog because it doesn't have that contention.
+The asymmetry between LanceDB's per-write commit mutex and Tantivy's per-phase locking reflects the asymmetry in their physical layouts. LanceDB has one shared dataset across all catalogs; Tantivy has per-catalog folders. The commit mutex is for cross-catalog physical contention on the shared dataset, and Tantivy doesn't need an analog because it doesn't have that contention.
**Acquisition timing.** The database lock and catalog lock are acquired at the start of the writer operation, before any database I/O. For CLI use, "the start of the writer operation" is the synchronous entry point of the command handler, before `block_on(...)`; for a future long-lived host (such as an MCP server), it is the start of the request handler that performs the write. Either way, the lock is acquired before commit resolution or other expensive setup that should not be repeated after waiting for the lock. Lock acquisition is per-operation, not per-process: a long-lived host acquires and releases for each request rather than holding any lock across requests.
@@ -104,9 +104,9 @@ The asymmetry between LanceDB's per-write commit mutex and Tantivy's per-phase l
**`init-db` short-circuit and under-lock recheck.** An idempotent re-run of `init-db` against an already-initialized database should not block waiting for in-flight crawls. The implementation does an existence check on `monodex-meta.json` before acquiring the database-exclusive lock; if the database is already initialized, `init-db` returns immediately with the "already initialized" message. If the pre-lock check sees an uninitialized state, the lock is acquired, and the existence check is repeated under the lock. Two concurrent `init-db` invocations could both pass the pre-lock check, but only one will pass the under-lock check; the other observes the just-completed initialization and returns. This is the standard double-checked pattern.
-**`init-db` empty-directory tolerance.** The existing `init-db` treats a database directory containing only `.monodex.lock` as "empty enough to initialize." Under this design the same logic must accept a `locks/` subdirectory and its contents as ignorable initialization detritus, since a crash after creating `/locks/database.lock` but before writing `monodex-meta.json` would otherwise be misreported as "not a monodex database."
+**`init-db` empty-folder tolerance.** The existing `init-db` treats a database folder containing only `.monodex.lock` as "empty enough to initialize." Under this design the same logic must accept a `locks/` subfolder and its contents as ignorable initialization detritus, since a crash after creating `/locks/database.lock` but before writing `monodex-meta.json` would otherwise be misreported as "not a monodex database."
-**Old lockfile.** The existing `init_db.rs` creates `/.monodex.lock` at the database root. Under this design, that file is detritus: the new code creates and uses `/locks/database.lock` instead. Per the project's pre-1.0 schema-migration policy, no cleanup of the old file is performed; it sits inert in the database directory and harms nothing.
+**Old lockfile.** The existing `init_db.rs` creates `/.monodex.lock` at the database root. Under this design, that file is detritus: the new code creates and uses `/locks/database.lock` instead. Per the project's pre-1.0 schema-migration policy, no cleanup of the old file is performed; it sits inert in the database folder and harms nothing.
## Lockfile lifecycle and on-disk layout
@@ -125,7 +125,7 @@ The asymmetry between LanceDB's per-write commit mutex and Tantivy's per-phase l
Lockfiles are persistent. Once created, they are not deleted on lock release. The OS-level file lock is what carries the locked-or-not state, tracked by the kernel against the file descriptor; the file on disk is purely a rendezvous point. This is the same discipline the existing `init_db.rs` uses for its lockfile (which becomes `locks/database.lock` under this design).
-Each helper function ensures its lockfile's parent directory exists before opening the lockfile, using `fs::create_dir_all`. The cost is one extra syscall per acquisition (microseconds, dominated by the open and lock syscalls themselves), and the result is that no caller has to remember an ordering invariant about which operation creates the `locks/` tree.
+Each helper function ensures its lockfile's parent folder exists before opening the lockfile, using `fs::create_dir_all`. The cost is one extra syscall per acquisition (microseconds, dominated by the open and lock syscalls themselves), and the result is that no caller has to remember an ordering invariant about which operation creates the `locks/` tree.
The lockfile contents are empty. Nothing is written into them. They exist only as named handles for `flock` / `LockFileEx`.
@@ -151,7 +151,7 @@ Network filesystems (NFS, SMB, etc.) are out of scope: the README explicitly dis
## What the lock design does not do
-- It does not prevent corruption from filesystem-level mishandling: simultaneous database access from multiple machines via a network filesystem, `rm -rf` against in-use directories, or copying a database directory while it's being written are all undefined.
+- It does not prevent corruption from filesystem-level mishandling: simultaneous database access from multiple machines via a network filesystem, `rm -rf` against in-use folders, or copying a database folder while it's being written are all undefined.
- It does not coordinate across Monodex versions. A database written by `monodex 0.5` and accessed by `monodex 0.6` proceeds under the schema-version check, not under any lock-version compatibility scheme.
- It does not provide fairness guarantees beyond what the kernel provides. POSIX `flock` and Windows `LockFileEx` are typically fair on uncontended-then-contended sequences, but neither documents FIFO ordering. POSIX `flock` is specifically not writer-preferring, so in theory a steady stream of shared-lock acquisitions can starve a pending exclusive request; in practice Monodex's workload (15-30 minute crawls, infrequent operations) does not produce the high-frequency contention pattern that would manifest as starvation.
- It does not aim to maximize parallelism. Two crawls against the same catalog serialize on the catalog lock; two crawls against different catalogs run mostly in parallel but serialize briefly at LanceDB commit points. Workloads that need finer-grained parallelism than this provides are out of scope; the right answer there would be a different storage layout, not finer locks.
@@ -164,6 +164,6 @@ The lock taxonomy is intended to support several future extensions without redes
- **Watch mode** holds a long-lived `IndexWriter` per actively-watched label. Under this design, watch mode would hold the per-catalog lock for the duration of the watch session. Per-command acquisition is what's implemented today; long-held acquisition is a lifetime change, not a model change.
- **Long-lived host process** (such as an MCP server). Acquires locks per-request rather than per-process: each request that writes acquires the relevant locks, runs, releases. The lock taxonomy is unchanged. Two implementation surfaces matter: blocking lock acquisitions move to `spawn_blocking` to avoid stalling tokio runtime workers, and the progress callback installs a route through the host's protocol surface rather than printing. Both points are noted in the storage-layer integration section. Reader-side, a long-lived host holding open `Database` handles must be robust to a concurrent purge invalidating its read state, the same way `monodex search` is.
-- **Schema upgrade** (`monodex upgrade-db`, planned for once users have databases large or long-lived enough that recrawl becomes painful). The shape of the upgrade operation is undecided: it could be in-place rewrite of the existing database directory, or a friendlier "delete and recrawl" that automates what users do manually today under the current schema-bump policy. The lock implications are different. In-place rewrite acquires the database-exclusive lock against the existing database, blocks readers from opening it during the rewrite (the lock-free reader contract here assumes no destructive in-place rewrites are happening behind readers' backs, which has to be revisited if the upgrade goes this way), and faces the same Windows-mmap-during-write platform difference noted for purge above. Recrawl-into-fresh-directory doesn't really need the lock; it operates on a path that no reader has open. Picking between these is upgrade-design work, not lock-design work; the lock taxonomy supports either, but the shape of upgrade affects what the reader contract has to promise.
+- **Schema upgrade** (`monodex upgrade-db`, planned for once users have databases large or long-lived enough that recrawl becomes painful). The shape of the upgrade operation is undecided: it could be in-place rewrite of the existing database folder, or a friendlier "delete and recrawl" that automates what users do manually today under the current schema-bump policy. The lock implications are different. In-place rewrite acquires the database-exclusive lock against the existing database, blocks readers from opening it during the rewrite (the lock-free reader contract here assumes no destructive in-place rewrites are happening behind readers' backs, which has to be revisited if the upgrade goes this way), and faces the same Windows-mmap-during-write platform difference noted for purge above. Recrawl-into-fresh-folder doesn't really need the lock; it operates on a path that no reader has open. Picking between these is upgrade-design work, not lock-design work; the lock taxonomy supports either, but the shape of upgrade affects what the reader contract has to promise.
- **Orphaned-lockfile cleanup** (a future maintenance command) scans `locks/per-catalog/`, cross-references against `monodex-config.json`, and deletes lockfiles for catalogs no longer in config. Acquires each lockfile briefly with `try_lock` before deleting, to avoid removing actively-held files.
- **Future FTS storage changes.** The catalog-level writer contract is the stable interface; future internal restructuring of FTS state should preserve it.
diff --git a/docs/design/crawl.md b/docs/design/crawl.md
index 12c69fc..7875cc6 100644
--- a/docs/design/crawl.md
+++ b/docs/design/crawl.md
@@ -20,7 +20,7 @@ Two enumeration paths, depending on the source:
**Commit mode:** Use `gix` to walk the commit tree recursively. The walker emits a sequence of `(blob_id, relative_path)` pairs for every blob in the tree. Non-blob entries (submodules, symlinks under some repo configurations) are filtered out. Monodex doesn't follow submodule pointers and doesn't materialize symlink targets.
-**Working-directory mode:** Build a map of Git's working-tree view using `git ls-files` (for tracked files) and `git status -u` (for untracked non-ignored files). The map contains `(relative_path, blob_id)` pairs for all files in Git's working-tree view: tracked files at their current working-tree contents (including local modifications), plus untracked non-ignored files. Deleted files are removed from the view. Files under hidden directories (`.github/`, `.vscode/`, etc.) are included because the enumeration is driven by the Git-derived blob map, not a filesystem walk. Blob IDs are computed by shelling out to `git hash-object`. `.gitignore`-excluded files are not included even when present on disk. The minimum required Git version is 2.35.0 (for `git ls-files --format`).
+**Working-directory mode:** Build a map of Git's working-tree view using `git ls-files` (for tracked files) and `git status -u` (for untracked non-ignored files). The map contains `(relative_path, blob_id)` pairs for all files in Git's working-tree view: tracked files at their current working-tree contents (including local modifications), plus untracked non-ignored files. Deleted files are removed from the view. Files under hidden folders (`.github/`, `.vscode/`, etc.) are included because the enumeration is driven by the Git-derived blob map, not a filesystem walk. Blob IDs are computed by shelling out to `git hash-object`. `.gitignore`-excluded files are not included even when present on disk. The minimum required Git version is 2.35.0 (for `git ls-files --format`).
The blob-ID compatibility between the two modes is load-bearing: it's what makes a `--working-dir` re-crawl over an unchanged repo skip every file via the sentinel check, with no re-embedding. Earlier versions used a SHA-256 content hash for working-dir mode, which produced different `file_id` values from commit mode and broke incremental skipping. The current implementation uses `git ls-files`, `git status`, and `git hash-object --stdin-paths` so that `.gitattributes`, clean filters, and other repo-specific settings are respected and the resulting blob IDs match what `git` would compute on commit.
@@ -32,11 +32,11 @@ Build a `HashMap` covering every `package.json` in
For commit mode, the strategy is two batched Git operations: `git ls-tree -r -z ` to find every `package.json`, then `git cat-file --batch` over a single long-lived process to read all the blobs. This avoids per-file fork overhead and keeps the build to one focused tree enumeration plus one stream of blob reads.
-For working-directory mode, the package index is built by iterating the Git-aware blob map (the same working-tree view used by working-directory file enumeration; see Step 2) and reading each `package.json` from the filesystem. The blob map includes both tracked and untracked non-ignored files; `package.json` files under hidden directories are included.
+For working-directory mode, the package index is built by iterating the Git-aware blob map (the same working-tree view used by working-directory file enumeration; see Step 2) and reading each `package.json` from the filesystem. The blob map includes both tracked and untracked non-ignored files; `package.json` files under hidden folders are included.
-For each `package.json`, the `"name"` field is parsed out and stored under the directory's repo-relative path as the key. Repo-root `package.json` is keyed by the empty string `""`.
+For each `package.json`, the `"name"` field is parsed out and stored under the folder's repo-relative path as the key. Repo-root `package.json` is keyed by the empty string `""`.
-Lookup happens later, during file processing: given a file at `libraries/lib1/src/Example.ts`, the index is queried for ancestor directories in this order:
+Lookup happens later, during file processing: given a file at `libraries/lib1/src/Example.ts`, the index is queried for ancestor folders in this order:
1. `libraries/lib1/src`
2. `libraries/lib1`
@@ -80,15 +80,15 @@ The "only after success" rule matters because the touched set is only complete o
## Step 6: FTS phase
-This step runs only if `fts` is in the new retrieval selection. It is a batch reconciliation against the per-label Tantivy index at `/fts///`, not a per-file fast path.
+This step runs only if `fts` is in the new retrieval selection. It is a batch reconciliation against the per-label Tantivy index at `/fts///`, not a per-file fast path.
The phase reads the label's chunks from LanceDB (via `get_chunks_for_label`) and derives the currently indexed `row_id` set from Tantivy's term dictionary (using the alive-bitset filter, so tombstoned-but-not-yet-merged docs do not appear). The diff is computed as a set difference of `row_id`s: chunks present in LanceDB but not in Tantivy are added; chunks present in Tantivy but no longer in LanceDB are removed via `delete_term`. After all additions and deletions are queued, the phase calls `commit()` once. For commit-mode crawls, the phase then calls `wait_merging_threads()` to consolidate; for working-dir crawls, it skips this and accepts fragmentation as the cost of speed (full re-crawl will clean up).
-After `commit()` succeeds, the manifest at `/fts///manifest.json` is written with the `FTS_SCHEMA_ID` and `FTS_TOKENIZER_ID` constants the index was built with. The manifest stores only compatibility metadata; it does not track row_ids.
+After `commit()` succeeds, the manifest at `/fts///manifest.json` is written with the `FTS_SCHEMA_ID` and `FTS_TOKENIZER_ID` constants the index was built with. The manifest stores only compatibility metadata; it does not track row_ids.
**Schema and tokenizer ID mismatch.** The schema and tokenizer behavior are versioned by `FTS_SCHEMA_ID` and `FTS_TOKENIZER_ID` constants in `src/engine/identity.rs`. Mismatch is detected via the manifest's stored IDs, not by introspecting Tantivy's on-disk schema:
-- If the manifest's IDs do not match the current constants: delete the per-label FTS directory and rebuild from scratch. The intent is to recover automatically from version bumps.
+- If the manifest's IDs do not match the current constants: delete the per-label FTS folder and rebuild from scratch. The intent is to recover automatically from version bumps.
- If Tantivy fails to open with the manifest's IDs matching, or if the manifest is unreadable while Tantivy state exists: error out with a clear message. This is corruption and should reach a human, not be papered over by a silent rebuild.
These IDs do not participate in `row_id`. Chunk identity is a chunk-storage concept; FTS has its own invalidation surface, and a tokenizer tweak does not force re-embedding.
diff --git a/docs/design/monodex_files.md b/docs/design/monodex_files.md
index 33a1023..83de354 100644
--- a/docs/design/monodex_files.md
+++ b/docs/design/monodex_files.md
@@ -1,11 +1,11 @@
# Files Monodex reads and writes
-This document inventories every file involved in Monodex's runtime contract: config-folder state, the database directory, repo-local files Monodex reads from the indexed repository, and the schema and template files that ship with the project. It is the central reference for "what is this file, who owns it, what writes it, and is it safe to modify by hand."
+This document inventories every file involved in Monodex's runtime contract: config-folder state, the database folder, repo-local files Monodex reads from the indexed repository, and the schema and template files that ship with the project. It is the central reference for "what is this file, who owns it, what writes it, and is it safe to modify by hand."
Two placeholders are used throughout:
- ``: the Monodex config folder. Defaults to `~/.monodex/`, overridable via the `MONODEX_CONFIG_FOLDER` environment variable or `--config-folder` CLI flag. Resolution logic in `src/paths.rs`. A relative path is resolved against the current working directory at process start; empty or whitespace-only values are treated as unset.
-- ``: the database directory. Defaults to `/default-db/`, relocatable via the `database.path` field in `monodex-config.json`. Must be an absolute path on a local filesystem.
+- ``: the database folder. Defaults to `/default-db/`, relocatable via the `database.path` field in `monodex-config.json`. Must be an absolute path on a local filesystem.
## A note on validation
@@ -20,7 +20,7 @@ One distinction worth knowing: `monodex-config.json` and `monodex-crawl-config.j
## Config folder
-The config folder contains three user-facing JSON files plus the default database directory.
+The config folder contains three user-facing JSON files plus the default database folder.
### `/monodex-config.json`
@@ -36,32 +36,32 @@ User-editable, optional. The user-global crawl config: file-type-to-strategy map
This file is not auto-created by current Monodex. Auto-creation on first run, with a starter template seeded from `examples/monodex-crawl-config.json`, is part of the planned `monodex init` flow.
-## Database directory
+## Database folder
-`` contains a metadata file and the LanceDB tables. It is not designed to be edited by hand. Every file in it is tool-managed except where noted.
+`` contains a metadata file and the LanceDB tables. It is not designed to be edited by hand. Every file in it is tool-managed except where noted.
-The database location must be on a local filesystem. Network filesystems and synced cloud folders (NFS, SMB, Dropbox, OneDrive, iCloud, Google Drive, etc.) are not supported. The writer-lock layer that coordinates concurrent operations against this directory is described in [concurrency.md](./concurrency.md); its lockfiles live under `/locks/`.
+The database location must be on a local filesystem. Network filesystems and synced cloud folders (NFS, SMB, Dropbox, OneDrive, iCloud, Google Drive, etc.) are not supported. The writer-lock layer that coordinates concurrent operations against this folder is described in [concurrency.md](./concurrency.md); its lockfiles live under `/locks/`.
-### `/monodex-meta.json`
+### `/monodex-meta.json`
Records the schema version, creation timestamp, the binary version that created the database, and the Lance format version at creation time. Written by `monodex init-db`; read on every database open. Defined in `src/engine/storage/database.rs`.
The `monodex_schema_version` field is the load-bearing one. Every database open reads it and compares it to the `MONODEX_SCHEMA_VERSION` constant in `src/engine/schema.rs`. A mismatch fails the open with a clear error rather than attempting silent migration. Bumping the schema version is a breaking change to existing databases (users have to rebuild), and any change to the schema's column shape requires a bump. This includes adding columns, even though LanceDB itself can store rows with unset columns: an older binary running against a newer database has no contract that says "blank cells in this column are OK," so the safe rule is to treat any shape change as breaking. The compatibility cost of avoiding a bump (writing code to read schemas with unfamiliar columns, deciding what to do with new columns when writing rows) is not worth absorbing without a concrete need.
-The current remedy for a schema-mismatch error is `monodex init-db --delete-everything`, which deletes the entire `` and recreates it. The schema-mismatch error message points at this command directly. All catalogs must be re-crawled afterward; this is acceptable while recrawl remains cheap relative to the migration-code-and-coordination cost of supporting cross-version databases. A `monodex upgrade-db` verb is in the backlog as the eventual replacement once users have databases large or long-lived enough that recrawl becomes painful.
+The current remedy for a schema-mismatch error is `monodex init-db --delete-everything`, which deletes the entire `` and recreates it. The schema-mismatch error message points at this command directly. All catalogs must be re-crawled afterward; this is acceptable while recrawl remains cheap relative to the migration-code-and-coordination cost of supporting cross-version databases. A `monodex upgrade-db` verb is in the backlog as the eventual replacement once users have databases large or long-lived enough that recrawl becomes painful.
-### `/chunks.lance/` and `/label_metadata.lance/`
+### `/chunks.lance/` and `/label_metadata.lance/`
-LanceDB tables. The `.lance/` suffix is LanceDB's directory-based table format: every LanceDB table is a directory with that suffix containing data files, transaction logs, and index files. The suffix is a LanceDB convention, not a Monodex one; that's why the LanceDB tables are sibling directories under `` rather than nested inside a `vectordb/` subdirectory. Schema definitions live in `src/engine/schema.rs`; row types in `src/engine/storage/rows.rs`.
+LanceDB tables. The `.lance/` suffix is LanceDB's directory-based table format: every LanceDB table is a directory with that suffix containing data files, transaction logs, and index files. The suffix is a LanceDB convention, not a Monodex one; that's why the LanceDB tables are sibling directories under `` rather than nested inside a `vectordb/` subdirectory. Schema definitions live in `src/engine/schema.rs`; row types in `src/engine/storage/rows.rs`.
-The naming convention for `` siblings is: any directory ending in `.lance/` is a LanceDB table; everything else is something else. The Tantivy FTS state lives at `/fts/`, described in its own section below.
+The naming convention for `` siblings is: any directory ending in `.lance/` is a LanceDB table; everything else is something else. The Tantivy FTS state lives at `/fts/`, described in its own section below.
-### `/fts/`
+### `/fts/`
-Per-label Tantivy index directories for full-text search. Tool-managed; not designed to be edited by hand. Layout:
+Per-label Tantivy index folders for full-text search. Tool-managed; not designed to be edited by hand. Layout:
```
-/fts/
+/fts/
/
/
meta.json (Tantivy's; tracks which segments belong to this index)
@@ -72,15 +72,15 @@ Per-label Tantivy index directories for full-text search. Tool-managed; not desi
Each label gets its own Tantivy index because BM25 statistics are computed per-corpus at index time. Sharing one Tantivy index across labels would mix statistics from chunks that don't belong to the queried label.
-`/fts/` is created by `monodex init-db`. Per-catalog and per-label subdirectories are created lazily on first FTS write for that label. The colon-form qualified label_id (`catalog:label`) is for in-memory use; the on-disk form uses nested directories to avoid colons (Windows hostility).
+`/fts/` is created by `monodex init-db`. Per-catalog and per-label subfolders are created lazily on first FTS write for that label. The colon-form qualified label_id (`catalog:label`) is for in-memory use; the on-disk form uses nested folders to avoid colons (Windows hostility).
The Monodex-side `manifest.json` stores FTS compatibility metadata: the `FTS_SCHEMA_ID` and `FTS_TOKENIZER_ID` constants the index was built with. When these don't match the current binary's constants, the index is stale and must be rebuilt. The manifest does not track row_ids; the currently indexed set is derived from Tantivy's term dictionary at crawl time. See [crawl.md](./crawl.md) for the indexing flow and [search.md](./search.md) for tokenizer behavior.
-Post-purge invariant: after `monodex purge --all` succeeds, `/fts/` exists and is empty, regardless of whether it existed before. After `monodex purge --catalog `, `/fts//` is removed entirely; sibling catalogs are untouched.
+Post-purge invariant: after `monodex purge --all` succeeds, `/fts/` exists and is empty, regardless of whether it existed before. After `monodex purge --catalog `, `/fts//` is removed entirely; sibling catalogs are untouched.
-### `/locks/`
+### `/locks/`
-Lockfiles used by the writer-lock layer (see [concurrency.md](./concurrency.md)). Contents are empty; the file's role is as a named handle for OS-level file locking (`flock` on POSIX, `LockFileEx` on Windows). The directory contains `database.lock`, `commit.lock`, and a `per-catalog/` subdirectory holding one lockfile per catalog. Lockfiles are persistent: they are not deleted on lock release, and `rm -rf locks/` is safe when no Monodex process is running.
+Lockfiles used by the writer-lock layer (see [concurrency.md](./concurrency.md)). Contents are empty; the file's role is as a named handle for OS-level file locking (`flock` on POSIX, `LockFileEx` on Windows). The folder contains `database.lock`, `commit.lock`, and a `per-catalog/` subfolder holding one lockfile per catalog. Lockfiles are persistent: they are not deleted on lock release, and `rm -rf locks/` is safe when no Monodex process is running.
## Repo-local files
@@ -92,7 +92,7 @@ User-editable, optional. If present at the root of the indexed repo, this overri
### `package.json` files anywhere in the repo
-Read during the package-indexing step of every crawl (see [crawl.md](./crawl.md)). Monodex reads only the `"name"` field; other fields are ignored. The package index is built by enumerating every `package.json` in the commit tree (commit-mode crawls) or the working directory (working-dir crawls) and resolving each indexed file to its nearest-ancestor package by directory. Monodex does not write to `package.json` files.
+Read during the package-indexing step of every crawl (see [crawl.md](./crawl.md)). Monodex reads only the `"name"` field; other fields are ignored. The package index is built by enumerating every `package.json` in the commit tree (commit-mode crawls) or the working directory (working-dir crawls) and resolving each indexed file to its nearest-ancestor package by folder. Monodex does not write to `package.json` files.
## Shipped artifacts
@@ -120,4 +120,4 @@ Each is a fully-commented example of the corresponding format with sensible defa
JSON-with-comments is the format Rush Stack uses for user-editable JSON. Comments serve two purposes: as ambient documentation that survives editing (the user keeps the comments when they tweak a value, so the next time they open the file they remember what each field does), and as an upgrade vector. When the comment guidance changes, Monodex can offer to upgrade the comments in a user's existing file while preserving their values, analogous to how Debian package upgrades present new versions of `/etc` config files for diff-and-merge. This is not a settled industry convention; calling the format JSONC is misleading because several different specifications use that name. The format is JSON-with-comments. It is not JSON5 (a JavaScript subset much broader than JSON-with-comments).
-The current directory name `examples/` is a misnomer: these files are templates first and examples second. A future rename to something like `config-templates/` is a candidate for the backlog.
+The current folder name `examples/` is a misnomer: these files are templates first and examples second. A future rename to something like `config-templates/` is a candidate for the backlog.
diff --git a/docs/design/search.md b/docs/design/search.md
index 2856be0..49d69f3 100644
--- a/docs/design/search.md
+++ b/docs/design/search.md
@@ -9,7 +9,7 @@ The relevant source files are `src/app/commands/search.rs` (top-level command ha
Monodex ships two retrieval methods: `vector` (semantic similarity over chunk embeddings) and `fts` (lexical search over chunk text via Tantivy). Each is queried independently and produces ranked `row_id` results scoped to a label. They expose engine APIs as peers:
- `vector_search(label, embedding_query, limit)` over the `chunks` table's `vector` column.
-- `fts_search(label, text_query, limit)` over the per-label Tantivy index at `/fts///`.
+- `fts_search(label, text_query, limit)` over the per-label Tantivy index at `/fts///`.
Each label carries a **retrieval selection**: the set of methods built for it. The selection is set at crawl time via `--retrieval` (see [crawl.md](./crawl.md)) and consulted at search time. A method not in the selection cannot be queried for the label.
@@ -110,13 +110,13 @@ At very large `--limit` the rule degenerates: with `--limit >= 50`, `candidate_l
Either retriever can fail or degrade. The rules:
- **FTS `ParseError`**: hard error. The user typed something with FTS-meaningful syntax (a quote, a colon, a field-prefix); silently degrading to vector-only would surface results that miss the user's evident intent.
-- **FTS `NoIndex`** (the directory genuinely doesn't exist; most likely a concurrent `purge --catalog` between metadata read and FTS open): warn and degrade to vector-only. Fires only when `fts_complete = true`; the `fts_complete = false` case is covered by the upstream incomplete-method warning and would be a duplicate.
+- **FTS `NoIndex`** (the folder genuinely doesn't exist; most likely a concurrent `purge --catalog` between metadata read and FTS open): warn and degrade to vector-only. Fires only when `fts_complete = true`; the `fts_complete = false` case is covered by the upstream incomplete-method warning and would be a duplicate.
- **FTS returns zero hits**: not a failure. Fusion proceeds with vector-only candidates; results show `[v]` markers.
- **Vector embedder/backend failure**: hard error. Vector failures are infrastructure problems (model load, ONNX runtime, LanceDB I/O); silently degrading would mask them.
- **Vector returns zero hits**: not a failure. This is reachable only when the label's chunk set is empty for vector (vector search is nearest-neighbor; non-empty corpus always returns the nearest chunks regardless of relevance).
- **Both methods return zero hits**: print `No results.` regardless of preceding warnings. The `No results.` line is a load-bearing tool signal: agents and machine consumers rely on its presence to mean "zero results" and its absence to mean "results follow."
-Under single-method search (`--retrieval fts` against a label whose `fts_complete = true` but the on-disk directory is missing), the same load-bearing rule applies: a NoIndex warning fires, then `No results.` follows. There is no fallback path in single-method mode, so the warning makes the missing-state visible and `No results.` carries the zero-results signal that consumers rely on.
+Under single-method search (`--retrieval fts` against a label whose `fts_complete = true` but the on-disk folder is missing), the same load-bearing rule applies: a NoIndex warning fires, then `No results.` follows. There is no fallback path in single-method mode, so the warning makes the missing-state visible and `No results.` carries the zero-results signal that consumers rely on.
The `NoIndex` rule is the most asymmetric. It exists because `NoIndex` is the one failure mode that genuinely is "the data is gone, but the user's query is fine"; every other failure either reflects a user-input problem or an infrastructure problem.
@@ -205,10 +205,10 @@ Stderr is preserved for output that is outside the renderer's scope: crawl-time
The arguments diverge meaningfully across retrieval methods (FTS-debug needs `--query` parsed into Tantivy's query AST; a hypothetical vector-debug would need different inputs entirely), so this is a standalone command rather than a subcommand of a generic `debug`.
-The `tantivy-cli` crate is also usable against the on-disk FTS index for advanced introspection. The directory layout is in [monodex_files.md](./monodex_files.md).
+The `tantivy-cli` crate is also usable against the on-disk FTS index for advanced introspection. The folder layout is in [monodex_files.md](./monodex_files.md).
## Concurrency
Search acquires no Monodex locks. A reader runs concurrently with a writing `monodex crawl` in another process. Tantivy supports multiple `IndexReader`s alongside one `IndexWriter`, with readers seeing the last-committed snapshot. LanceDB readers see committed table state.
-A reader during a concurrent `purge --catalog X` may encounter directory-disappearance errors as the purge unlinks `/fts//`. The FTS read paths use typed-error discrimination: a `NotFound` from any Tantivy operation (open, search, segment access) on a per-label FTS path normalizes to `FtsSearchOutcome::NoIndex`, which surfaces as the "FTS state missing" warning rather than a raw IO error. Other Tantivy errors (corruption, mmap failures that are not `NotFound`) remain hard errors. See [concurrency.md](./concurrency.md) for the full reader contract.
+A reader during a concurrent `purge --catalog X` may encounter folder-disappearance errors as the purge unlinks `/fts//`. The FTS read paths use typed-error discrimination: a `NotFound` from any Tantivy operation (open, search, segment access) on a per-label FTS path normalizes to `FtsSearchOutcome::NoIndex`, which surfaces as the "FTS state missing" warning rather than a raw IO error. Other Tantivy errors (corruption, mmap failures that are not `NotFound`) remain hard errors. See [concurrency.md](./concurrency.md) for the full reader contract.
diff --git a/docs/smoke_test.md b/docs/smoke_test.md
index c0b2045..2b73975 100644
--- a/docs/smoke_test.md
+++ b/docs/smoke_test.md
@@ -39,7 +39,7 @@ If `~/.monodex/monodex-config.json` does not exist, create it. Use the absolute
./target/release/monodex init-db
```
-The command should complete without error and create `~/.monodex/default-db/` containing `monodex-meta.json`, `chunks.lance/`, `label_metadata.lance/`, an empty `fts/` directory (per-label Tantivy indexes are created lazily on first FTS write), and a `locks/` directory used by the writer-lock layer.
+The command should complete without error and create `~/.monodex/default-db/` containing `monodex-meta.json`, `chunks.lance/`, `label_metadata.lance/`, an empty `fts/` folder (per-label Tantivy indexes are created lazily on first FTS write), and a `locks/` folder used by the writer-lock layer.
This command is idempotent; running it again on an existing database is safe.
@@ -127,7 +127,7 @@ After this, subsequent `search` and `view` commands can omit `--catalog` and `--
The procedure above runs against `~/.monodex/`, which is shared with the user's normal Monodex installation. For most verification work this is fine: the purge in step 1 ensures the test starts fresh, and re-using the same catalog and database between runs saves time.
-A clean-slate variant runs the same test against a completely fresh config folder, with no shared state. Set `MONODEX_CONFIG_FOLDER` to a temporary directory before any of the steps:
+A clean-slate variant runs the same test against a completely fresh config folder, with no shared state. Set `MONODEX_CONFIG_FOLDER` to a temporary folder before any of the steps:
```
export MONODEX_CONFIG_FOLDER=/tmp/monodex-smoke-test
diff --git a/schemas/config.schema.json b/schemas/config.schema.json
index 59cb54f..0d97985 100644
--- a/schemas/config.schema.json
+++ b/schemas/config.schema.json
@@ -17,7 +17,7 @@
"properties": {
"path": {
"type": "string",
- "description": "Path to the database directory. Must be an absolute path; tilde (~), environment variables ($VAR), and relative paths are not supported. Omit to use the default location.",
+ "description": "Path to the database folder. Must be an absolute path; tilde (~), environment variables ($VAR), and relative paths are not supported. Omit to use the default location.",
"examples": ["/absolute/path/to/db"]
}
}
@@ -70,7 +70,7 @@
},
"path": {
"type": "string",
- "description": "Path to the catalog root directory. Must be an absolute path; tilde (~), environment variables ($VAR), and relative paths are not supported.",
+ "description": "Path to the catalog root folder. Must be an absolute path; tilde (~), environment variables ($VAR), and relative paths are not supported.",
"examples": ["/path/to/monorepo"]
}
}
diff --git a/schemas/editing.md b/schemas/editing.md
index 67d78b9..aab60f8 100644
--- a/schemas/editing.md
+++ b/schemas/editing.md
@@ -1,6 +1,6 @@
# Editing the JSON Schemas
-The `.schema.json` files in this directory are JSON Schemas for Monodex's user-editable config files. They serve two purposes: editor integration (autocomplete, validation, inline documentation via `$schema` URLs) and as a published release artifact hosted on a Microsoft-managed schema server.
+The `.schema.json` files in this folder are JSON Schemas for Monodex's user-editable config files. They serve two purposes: editor integration (autocomplete, validation, inline documentation via `$schema` URLs) and as a published release artifact hosted on a Microsoft-managed schema server.
These files must remain strict, commentless JSON. Do not add comments, even in JSON-with-comments form. Do not reference repo-internal paths inside the schemas. Editors may consume them via `$schema` URL fetch and may not tolerate non-standard JSON.
diff --git a/src/app/cli.rs b/src/app/cli.rs
index 55fb663..c89f5dc 100644
--- a/src/app/cli.rs
+++ b/src/app/cli.rs
@@ -46,7 +46,7 @@ pub enum Commands {
/// Creates database tables for chunks and label metadata.
/// Idempotent: safe to run on an existing database.
InitDb {
- /// Delete the entire database directory and recreate it from scratch.
+ /// Delete the entire database folder and recreate it from scratch.
/// Use this to recover from a schema mismatch error. WARNING: destroys
/// all indexed data; you will need to re-crawl every catalog/label.
#[arg(long)]
@@ -168,9 +168,9 @@ pub enum Commands {
#[arg(long, default_value = "20")]
count: usize,
- /// Directory to sample from
+ /// Folder to sample from
#[arg(long)]
- dir: String,
+ folder: String,
},
/// Diagnose tokenization or ranking for a single chunk.
diff --git a/src/app/commands/audit_chunks.rs b/src/app/commands/audit_chunks.rs
index 4aac845..fb623c9 100644
--- a/src/app/commands/audit_chunks.rs
+++ b/src/app/commands/audit_chunks.rs
@@ -8,18 +8,18 @@ use std::path::PathBuf;
use crate::app::number_format::format_count;
use crate::engine::partitioner::{ChunkQualityReport, PartitionConfig, partition_typescript};
-pub fn run_audit_chunks(count: usize, dir: String) -> Result<()> {
+pub fn run_audit_chunks(count: usize, folder: String) -> Result<()> {
use rand::seq::IndexedRandom;
println!(
"📊 Sampling {} TypeScript files from: {}",
format_count(count as u64),
- dir
+ folder
);
println!();
// Collect all TypeScript files
- let ts_files: Vec = walkdir::WalkDir::new(&dir)
+ let ts_files: Vec = walkdir::WalkDir::new(&folder)
.into_iter()
.filter_map(|e| e.ok())
.filter(|e| {
@@ -73,14 +73,14 @@ pub fn run_audit_chunks(count: usize, dir: String) -> Result<()> {
println!("\n=== Quality Scores (worst first) ===\n");
for (i, (path, report, _)) in results.iter().enumerate() {
- let rel_path = path.strip_prefix(&dir).unwrap_or(path);
+ let rel_path = path.strip_prefix(&folder).unwrap_or(path);
println!("{}. {} {}", i + 1, report.format(), rel_path.display());
}
// Show top 3 worst for investigation
println!("\n=== Top 3 Worst Files ===\n");
for (path, report, chunks) in results.iter().take(3) {
- let rel_path = path.strip_prefix(&dir).unwrap_or(path);
+ let rel_path = path.strip_prefix(&folder).unwrap_or(path);
println!("--- {} ---", rel_path.display());
println!("{}", report.format());
diff --git a/src/app/commands/dump_chunks.rs b/src/app/commands/dump_chunks.rs
index 47603e6..0da06a3 100644
--- a/src/app/commands/dump_chunks.rs
+++ b/src/app/commands/dump_chunks.rs
@@ -177,7 +177,7 @@ pub fn run_dump_chunks(
/// Find the package name for a given source file.
///
-/// This walks upwards from the file's directory to find the nearest package.json
+/// This walks upwards from the file's folder to find the nearest package.json
/// and extracts the "name" field. If no package.json is found, it uses
/// the relative folder path from the repo root as a fallback identifier.
///
@@ -192,7 +192,7 @@ pub fn run_dump_chunks(
fn find_package_name(file_path: &str, repo_root: &str) -> String {
let path = Path::new(file_path);
- // Start from the file's directory
+ // Start from the file's folder
let mut current = path.parent().unwrap_or(path);
// Walk upwards looking for package.json
@@ -238,10 +238,10 @@ fn strip_to_relative_path(file_path: &str, repo_root: &str) -> String {
// Try to strip the repo root
if let Ok(rel) = file_path.strip_prefix(repo_path) {
- // Get the directory part only (remove the filename)
- let dir = rel.parent().unwrap_or(rel);
+ // Get the folder part only (remove the filename)
+ let folder = rel.parent().unwrap_or(rel);
// Convert to string, replace backslashes with forward slashes
- dir.to_string_lossy().replace('\\', "/")
+ folder.to_string_lossy().replace('\\', "/")
} else {
// Couldn't strip - use just the folder name
file_path
diff --git a/src/app/commands/init_db/mod.rs b/src/app/commands/init_db/mod.rs
index 7aff441..be040f4 100644
--- a/src/app/commands/init_db/mod.rs
+++ b/src/app/commands/init_db/mod.rs
@@ -1,7 +1,7 @@
-//! init-db command directory.
+//! init-db command folder.
//!
//! Purpose: Facade for the init-db command; re-exports the public entry point.
-//! Edit here when: Adding or renaming init-db submodules, or changing the public surface re-exported from this directory.
+//! Edit here when: Adding or renaming init-db submodules, or changing the public surface re-exported from this folder.
//! Do not edit here for: init-db behavior (see `run.rs`), tests (see `tests.rs`).
mod run;
diff --git a/src/app/commands/init_db/run.rs b/src/app/commands/init_db/run.rs
index f01ffc3..b21f764 100644
--- a/src/app/commands/init_db/run.rs
+++ b/src/app/commands/init_db/run.rs
@@ -19,7 +19,7 @@ use crate::engine::storage::{
// Existing database state enum
// ============================================================================
-/// Represents the state of an existing database directory.
+/// Represents the state of an existing database folder.
///
/// Used by `read_existing_database_state` to classify what's on disk,
/// then consumed by `check_existing_database_pre_lock` and `check_existing_database`
@@ -29,24 +29,24 @@ enum ExistingDatabaseState {
Missing,
/// Path exists and contains only lockfile/locks/fts detritus.
EmptyDirectory,
- /// Meta file and both table directories exist; schema version matches.
+ /// Meta file and both table folders exist; schema version matches.
Complete(MetaFile),
- /// Meta file is missing, but at least one table directory exists.
+ /// Meta file is missing, but at least one table folder exists.
/// Pre-lock: tolerate (concurrent init may be in progress).
/// Under-lock: bail (partial state is corruption).
TablesWithoutMeta,
- /// Meta file exists but at least one table directory is missing.
+ /// Meta file exists but at least one table folder is missing.
/// Always a bail at both call sites today.
MetaWithoutTables,
/// Meta file's recorded schema version does not match the current binary's.
IncompatibleSchema { recorded: u32, current: u32 },
/// Meta file is unreadable / corrupt.
CorruptMeta,
- /// Path exists, not empty, no meta file, no table directories.
+ /// Path exists, not empty, no meta file, no table folders.
NonMonodexDirectory,
}
-/// Read and classify the state of an existing database directory.
+/// Read and classify the state of an existing database folder.
///
/// This function is infallible - it returns a state classification,
/// not a Result. Callers decide what policy to apply to each state.
@@ -74,7 +74,7 @@ fn read_existing_database_state(db_path: &Path) -> ExistingDatabaseState {
};
}
- // Check that table directories exist
+ // Check that table folders exist
if !chunks_path.exists() || !labels_path.exists() {
return ExistingDatabaseState::MetaWithoutTables;
}
@@ -86,7 +86,7 @@ fn read_existing_database_state(db_path: &Path) -> ExistingDatabaseState {
return ExistingDatabaseState::TablesWithoutMeta;
}
- // Check if directory is empty (ignoring lockfile, locks/, and fts/ detritus)
+ // Check if folder is empty (ignoring lockfile, locks/, and fts/ detritus)
let is_empty = db_path
.read_dir()
.map(|mut entries| {
@@ -117,7 +117,7 @@ fn read_existing_database_state(db_path: &Path) -> ExistingDatabaseState {
/// Format the "parent missing" error with the database path.
pub(super) fn err_parent_missing(db_path: &Path) -> String {
format!(
- "Cannot create database at {}: parent directory does not exist.",
+ "Cannot create database at {}: parent folder does not exist.",
db_path.display()
)
}
@@ -162,11 +162,11 @@ pub fn run_init_db(config: &Config, delete_everything: bool) -> Result<()> {
// Step 1: Resolve database path from config
let db_path = resolve_database_path(config)?;
- // Step 2: Validate parent directory exists (with exception for default-db)
- // This must happen BEFORE any directory creation.
+ // Step 2: Validate parent folder exists (with exception for default-db)
+ // This must happen BEFORE any folder creation.
validate_parent_directory(&config.paths, &db_path)?;
- // Step 3: Create the database root directory (if it doesn't exist)
+ // Step 3: Create the database root folder (if it doesn't exist)
// For default-db, we can create config_folder if needed. For custom paths, parent must exist.
let config_folder = &config.paths.config_folder;
let default_db_path = config_folder.join("default-db");
@@ -196,7 +196,7 @@ pub fn run_init_db(config: &Config, delete_everything: bool) -> Result<()> {
e.ok()
.map(|e| {
let name = e.file_name();
- // Ignore locks directory - we hold a lock under it
+ // Ignore locks folder - we hold a lock under it
name != "locks"
})
.unwrap_or(false)
@@ -264,7 +264,7 @@ pub fn run_init_db(config: &Config, delete_everything: bool) -> Result<()> {
/// Returns None if the path doesn't exist, is empty (ignoring lockfile/locks detritus),
/// or is in a transient mid-init state (tables exist but meta missing).
/// Returns error for terminal conditions that no concurrent writer could resolve:
-/// corrupt meta, schema mismatch, meta present but tables missing, or non-empty non-monodex directory.
+/// corrupt meta, schema mismatch, meta present but tables missing, or non-empty non-monodex folder.
///
/// The pre-lock check is tolerant of the "tables exist, meta missing" state because another
/// process might be mid-init. The caller should acquire the exclusive lock and recheck with
@@ -287,7 +287,7 @@ pub(super) fn check_existing_database_pre_lock(db_path: &Path) -> Result Result
}
}
-/// Validate that the parent directory exists, with exception for default-db.
+/// Validate that the parent folder exists, with exception for default-db.
fn validate_parent_directory(paths: &crate::paths::Paths, db_path: &Path) -> Result<()> {
// Special case: if the path is exactly the default-db path under config_folder,
// we can create config_folder itself.
@@ -327,18 +327,18 @@ fn validate_parent_directory(paths: &crate::paths::Paths, db_path: &Path) -> Res
Ok(())
}
-/// Delete all contents of the database directory except the locks/ subdirectory.
+/// Delete all contents of the database folder except the locks/ subfolder.
///
/// This is used by `init-db --delete-everything` to wipe the database clean
/// while still holding the lock under locks/.
fn delete_database_contents(db_path: &Path) -> Result<()> {
let entries: Vec<_> = db_path
.read_dir()
- .map_err(|e| anyhow!("Failed to read database directory: {}", e))?
+ .map_err(|e| anyhow!("Failed to read database folder: {}", e))?
.filter_map(|e| e.ok())
.filter(|e| {
let name = e.file_name();
- // Don't delete the locks directory - we hold a lock under it
+ // Don't delete the locks folder - we hold a lock under it
name != "locks"
})
.collect();
@@ -347,7 +347,7 @@ fn delete_database_contents(db_path: &Path) -> Result<()> {
let path = entry.path();
if path.is_dir() {
fs::remove_dir_all(&path)
- .map_err(|e| anyhow!("Failed to remove directory {}: {}", path.display(), e))?;
+ .map_err(|e| anyhow!("Failed to remove folder {}: {}", path.display(), e))?;
} else {
fs::remove_file(&path)
.map_err(|e| anyhow!("Failed to remove file {}: {}", path.display(), e))?;
@@ -357,7 +357,7 @@ fn delete_database_contents(db_path: &Path) -> Result<()> {
Ok(())
}
-/// Create the database directory and initialize LanceDB tables.
+/// Create the database folder and initialize LanceDB tables.
async fn create_database(db_path: &Path) -> Result<()> {
// Open LanceDB connection
let conn = lancedb::connect(db_path.to_str().unwrap())
@@ -377,10 +377,10 @@ async fn create_database(db_path: &Path) -> Result<()> {
.await
.map_err(|e| anyhow!("Failed to create label_metadata table: {}", e))?;
- // Create fts directory for Tantivy indexes (populated lazily per label)
- let fts_dir = db_path.join("fts");
- std::fs::create_dir_all(&fts_dir)
- .map_err(|e| anyhow!("Failed to create fts directory: {}", e))?;
+ // Create fts folder for Tantivy indexes (populated lazily per label)
+ let fts_folder = db_path.join("fts");
+ std::fs::create_dir_all(&fts_folder)
+ .map_err(|e| anyhow!("Failed to create fts folder: {}", e))?;
// Write meta file using shared implementation (with fsync)
let meta = MetaFile::new();
diff --git a/src/app/commands/init_db/tests.rs b/src/app/commands/init_db/tests.rs
index 91769e1..1c0e8f9 100644
--- a/src/app/commands/init_db/tests.rs
+++ b/src/app/commands/init_db/tests.rs
@@ -47,20 +47,17 @@ fn test_happy_path_creates_database() {
// Verify structure
let db_path = temp_dir.path().join("default-db");
- assert!(db_path.exists(), "Database directory should exist");
+ assert!(db_path.exists(), "Database folder should exist");
assert!(
db_path.join(META_FILE).exists(),
"monodex-meta.json should exist"
);
- // Verify locks directory was created
- assert!(
- db_path.join("locks").exists(),
- "locks directory should exist"
- );
+ // Verify locks folder was created
+ assert!(db_path.join("locks").exists(), "locks folder should exist");
- // Verify fts directory was created
- assert!(db_path.join("fts").exists(), "fts directory should exist");
+ // Verify fts folder was created
+ assert!(db_path.join("fts").exists(), "fts folder should exist");
}
#[test]
@@ -112,7 +109,7 @@ fn test_parent_missing_non_default_db() {
fn test_path_exists_but_not_monodex_database() {
let temp_dir = TempDir::new().unwrap();
- // Create a directory with a stray file (not a monodex database)
+ // Create a folder with a stray file (not a monodex database)
let db_path = temp_dir.path().join("my-db");
fs::create_dir_all(&db_path).unwrap();
std::fs::File::create(db_path.join("stray-file.txt"))
@@ -196,7 +193,7 @@ fn test_schema_version_mismatch() {
fn test_meta_exists_tables_missing() {
let temp_dir = TempDir::new().unwrap();
- // Create database directory with meta file but no tables
+ // Create database folder with meta file but no tables
let db_path = temp_dir.path().join("default-db");
fs::create_dir_all(&db_path).unwrap();
let meta = MetaFile::new();
@@ -219,7 +216,7 @@ fn test_meta_exists_tables_missing() {
fn test_tables_exist_meta_missing() {
let temp_dir = TempDir::new().unwrap();
- // Create database directory with tables but no meta
+ // Create database folder with tables but no meta
let db_path = temp_dir.path().join("default-db");
fs::create_dir_all(&db_path).unwrap();
fs::create_dir_all(db_path.join("chunks.lance")).unwrap();
@@ -239,10 +236,10 @@ fn test_tables_exist_meta_missing() {
#[test]
fn test_empty_directory_with_locks_dir_succeeds() {
- // Test that a directory containing only locks/ is treated as empty
+ // Test that a folder containing only locks/ is treated as empty
let temp_dir = TempDir::new().unwrap();
- // Create database directory with only locks/database.lock
+ // Create database folder with only locks/database.lock
let db_path = temp_dir.path().join("default-db");
fs::create_dir_all(db_path.join("locks")).unwrap();
std::fs::File::create(db_path.join("locks/database.lock")).unwrap();
@@ -267,10 +264,10 @@ fn test_empty_directory_with_locks_dir_succeeds() {
#[test]
fn test_empty_directory_with_fts_dir_succeeds() {
- // Test that a directory containing only fts/ is treated as empty
+ // Test that a folder containing only fts/ is treated as empty
let temp_dir = TempDir::new().unwrap();
- // Create database directory with only fts/
+ // Create database folder with only fts/
let db_path = temp_dir.path().join("default-db");
fs::create_dir_all(db_path.join("fts")).unwrap();
@@ -366,10 +363,10 @@ fn test_delete_everything_with_existing_database() {
!db_path.join("extra-file.txt").exists(),
"Extra file should be deleted"
);
- // Verify locks directory still exists (not deleted)
+ // Verify locks folder still exists (not deleted)
assert!(
db_path.join("locks").exists(),
- "locks directory should be preserved"
+ "locks folder should be preserved"
);
}
@@ -439,7 +436,7 @@ fn test_delete_everything_with_v3_database() {
let paths = Paths::for_test(temp_dir.path().into());
let config = load_config(paths).expect("Config should load");
- // Create a database directory with a hand-written v3 meta file
+ // Create a database folder with a hand-written v3 meta file
let db_path = temp_dir.path().join("default-db");
fs::create_dir_all(&db_path).unwrap();
@@ -449,7 +446,7 @@ fn test_delete_everything_with_v3_database() {
}"#;
std::fs::write(db_path.join(META_FILE), meta_content).unwrap();
- // Also create minimal table directories so it looks like a real database
+ // Also create minimal table folders so it looks like a real database
fs::create_dir_all(db_path.join("chunks.lance")).unwrap();
fs::create_dir_all(db_path.join("label_metadata.lance")).unwrap();
diff --git a/src/app/commands/purge.rs b/src/app/commands/purge.rs
index 693c6fb..13623ef 100644
--- a/src/app/commands/purge.rs
+++ b/src/app/commands/purge.rs
@@ -60,12 +60,12 @@ async fn run_purge_all_async(db_path: &std::path::Path) -> anyhow::Result<()> {
chunk_storage.truncate().await?;
label_storage.truncate().await?;
- // Delete and recreate FTS directory (always recreate, even if absent)
- let fts_dir = db_path.join("fts");
- if fts_dir.exists() {
- std::fs::remove_dir_all(&fts_dir)?;
+ // Delete and recreate FTS folder (always recreate, even if absent)
+ let fts_folder = db_path.join("fts");
+ if fts_folder.exists() {
+ std::fs::remove_dir_all(&fts_folder)?;
}
- std::fs::create_dir_all(&fts_dir)?;
+ std::fs::create_dir_all(&fts_folder)?;
println!("✅ Database purged successfully");
Ok(())
@@ -85,10 +85,10 @@ async fn run_purge_catalog_async(
let chunks_deleted = chunk_storage.delete_by_catalog(catalog_name).await?;
let labels_deleted = label_storage.delete_by_catalog(catalog_name).await?;
- // Delete FTS directory for this catalog
- let fts_catalog_dir = db_path.join("fts").join(catalog_name);
- if fts_catalog_dir.exists() {
- std::fs::remove_dir_all(&fts_catalog_dir)?;
+ // Delete FTS folder for this catalog
+ let fts_catalog_folder = db_path.join("fts").join(catalog_name);
+ if fts_catalog_folder.exists() {
+ std::fs::remove_dir_all(&fts_catalog_folder)?;
}
println!(
diff --git a/src/app/commands/search.rs b/src/app/commands/search.rs
index 05fb43e..05431d8 100644
--- a/src/app/commands/search.rs
+++ b/src/app/commands/search.rs
@@ -346,7 +346,7 @@ async fn run_single_method_search(
// Handle NoIndex case
let source_pointer = format_source_pointer(label_metadata);
let warning = if label_metadata.fts_complete {
- // FTS was complete but directory is gone
+ // FTS was complete but folder is gone
SearchWarning::FtsNoIndexNoFallback {
label: preamble.label.clone(),
source_pointer,
@@ -509,7 +509,7 @@ async fn run_hybrid_search(
// Incomplete - warning already emitted via decision_warnings
// Don't add to method_results, skip FTS
} else {
- // Complete but directory is gone - degrade with warning
+ // Complete but folder is gone - degrade with warning
let source_pointer = format_source_pointer(label_metadata);
search_warnings.push(SearchWarning::FtsNoIndexDegrade {
label: preamble.label.clone(),
diff --git a/src/app/commands/test_fixtures.rs b/src/app/commands/test_fixtures.rs
index 70a8026..798e33c 100644
--- a/src/app/commands/test_fixtures.rs
+++ b/src/app/commands/test_fixtures.rs
@@ -32,7 +32,7 @@ pub async fn create_test_db_with_chunks(
chunks: Vec,
labels: Vec,
) {
- // Create database directory
+ // Create database folder
fs::create_dir_all(db_path).unwrap();
// Create LanceDB tables
@@ -51,7 +51,7 @@ pub async fn create_test_db_with_chunks(
.await
.expect("Failed to create label_metadata table");
- // Create fts directory for Tantivy indexes
+ // Create fts folder for Tantivy indexes
fs::create_dir_all(db_path.join("fts")).unwrap();
// Write meta file
diff --git a/src/app/config.rs b/src/app/config.rs
index d13ac70..1bb2ceb 100644
--- a/src/app/config.rs
+++ b/src/app/config.rs
@@ -23,7 +23,7 @@ use crate::paths::Paths;
#[derive(Debug, serde::Deserialize, Clone)]
#[serde(deny_unknown_fields)]
pub struct DatabaseConfig {
- /// Optional path to the database directory.
+ /// Optional path to the database folder.
/// If not specified, defaults to /default-db.
/// Must be an absolute path; tilde (~) and environment variables ($VAR) are not supported.
pub path: Option,
diff --git a/src/app/context.rs b/src/app/context.rs
index a9075a7..c6b1d66 100644
--- a/src/app/context.rs
+++ b/src/app/context.rs
@@ -119,7 +119,7 @@ pub fn load_default_context(paths: &Paths) -> Option {
pub fn save_default_context(paths: &Paths, catalog: &str, label: &str) -> anyhow::Result<()> {
let path = paths.context_file();
- // Create parent directory if needed
+ // Create parent folder if needed
if let Some(parent) = path.parent() {
std::fs::create_dir_all(parent)?;
}
diff --git a/src/app/crawl/mod.rs b/src/app/crawl/mod.rs
index a207f18..a2d0eba 100644
--- a/src/app/crawl/mod.rs
+++ b/src/app/crawl/mod.rs
@@ -2,7 +2,7 @@
//!
//! Purpose: Export crawl submodules and public surface for command handlers.
//! Edit here when: Adding or removing crawl submodules, or changing the public surface
-//! re-exported from this directory.
+//! re-exported from this folder.
//! Do not edit here for: Phase orchestration (see `phases.rs`), embed/upload pipeline
//! (see `pipeline.rs`), crawl types (see `types.rs`), warning handling (see `warning.rs`),
//! shared preamble setup (see `preamble.rs`), summary/warning rendering (see `summary.rs`),
diff --git a/src/app/crawl/phases.rs b/src/app/crawl/phases.rs
index 8fcda7a..7f3a9b0 100644
--- a/src/app/crawl/phases.rs
+++ b/src/app/crawl/phases.rs
@@ -617,8 +617,8 @@ mod tests {
let temp_dir = TempDir::new().unwrap();
let db_path = temp_dir.path();
- // Create database directory
- std::fs::create_dir_all(db_path).expect("Failed to create db directory");
+ // Create database folder
+ std::fs::create_dir_all(db_path).expect("Failed to create db folder");
// Create LanceDB tables
let conn = connect(db_path.to_str().unwrap())
@@ -641,8 +641,8 @@ mod tests {
let meta_file = File::create(db_path.join(META_FILE)).expect("Failed to create meta file");
serde_json::to_writer_pretty(meta_file, &meta).expect("Failed to write meta file");
- // Create FTS directory (normally done by init-db)
- std::fs::create_dir_all(db_path.join("fts")).expect("Failed to create fts directory");
+ // Create FTS folder (normally done by init-db)
+ std::fs::create_dir_all(db_path.join("fts")).expect("Failed to create fts folder");
// Open database
let db = Database::open(db_path)
diff --git a/src/engine/crawl_config.rs b/src/engine/crawl_config.rs
index 9a1945f..6ada0f8 100644
--- a/src/engine/crawl_config.rs
+++ b/src/engine/crawl_config.rs
@@ -46,12 +46,12 @@ pub struct CrawlConfig {
pub file_types: HashMap,
/// Glob patterns for paths to exclude from crawling
- /// Directory patterns (ending in "/") match any path under that directory
+ /// Folder patterns (ending in "/") match any path under that folder
#[serde(rename = "patternsToExclude")]
pub patterns_to_exclude: Vec,
/// Glob patterns that override exclusion (higher precedence)
- /// Directory patterns (ending in "/") match any path under that directory
+ /// Folder patterns (ending in "/") match any path under that folder
#[serde(rename = "patternsToKeep")]
pub patterns_to_keep: Vec,
}
@@ -62,16 +62,16 @@ pub struct CompiledCrawlConfig {
/// Original config
pub config: CrawlConfig,
- /// Compiled exclusion patterns (non-directory patterns)
+ /// Compiled exclusion patterns (non-folder patterns)
exclude_set: GlobSet,
- /// Compiled keep patterns (non-directory patterns)
+ /// Compiled keep patterns (non-folder patterns)
keep_set: GlobSet,
- /// Directory prefixes for exclusion (patterns ending in "/")
+ /// Folder prefixes for exclusion (patterns ending in "/")
exclude_dirs: Vec,
- /// Directory prefixes for keep (patterns ending in "/")
+ /// Folder prefixes for keep (patterns ending in "/")
keep_dirs: Vec,
}
@@ -138,18 +138,18 @@ impl CrawlConfig {
})
}
- /// Compile a list of patterns into a GlobSet and directory list.
+ /// Compile a list of patterns into a GlobSet and folder list.
///
/// Returns `(glob_set, dirs)` where:
- /// - `glob_set` matches non-directory patterns
- /// - `dirs` contains directory patterns (ending in `/`) for prefix matching
+ /// - `glob_set` matches non-folder patterns
+ /// - `dirs` contains folder patterns (ending in `/`) for prefix matching
fn compile_patterns(patterns: &[String]) -> Result<(GlobSet, Vec)> {
let mut builder = GlobSetBuilder::new();
let mut dirs = Vec::new();
for pattern in patterns {
if pattern.ends_with('/') {
- // Directory pattern: store as prefix matcher
+ // Folder pattern: store as prefix matcher
dirs.push(pattern.clone());
} else {
builder.add(Glob::new(pattern)?);
@@ -195,17 +195,17 @@ impl CompiledCrawlConfig {
self.matches_patterns(&self.keep_set, &self.keep_dirs, path)
}
- /// Shared logic for matching against a GlobSet and directory list.
+ /// Shared logic for matching against a GlobSet and folder list.
fn matches_patterns(&self, glob_set: &GlobSet, dirs: &[String], path: &str) -> bool {
// Check glob patterns
if glob_set.is_match(path) {
return true;
}
- // Check directory prefixes
+ // Check folder prefixes
for dir in dirs {
- // Directory patterns match if:
- // 1. Path starts with the directory (e.g., "lib/example.ts" matches "lib/")
- // 2. Path contains the directory with leading slash (e.g., "path/to/lib/file.ts" matches "lib/")
+ // Folder patterns match if:
+ // 1. Path starts with the folder (e.g., "lib/example.ts" matches "lib/")
+ // 2. Path contains the folder with leading slash (e.g., "path/to/lib/file.ts" matches "lib/")
if path.starts_with(dir) || path.contains(&format!("/{}", dir)) {
return true;
}
diff --git a/src/engine/fts/error.rs b/src/engine/fts/error.rs
index c77ae64..262aeac 100644
--- a/src/engine/fts/error.rs
+++ b/src/engine/fts/error.rs
@@ -8,10 +8,10 @@ use std::io;
use tantivy::TantivyError;
use tantivy::directory::error::{OpenDirectoryError, OpenReadError};
-/// Returns true if the Tantivy error indicates a directory or file that does not exist.
+/// Returns true if the Tantivy error indicates a folder or file that does not exist.
///
/// This is the load-bearing piece of the lock-free reader contract: when a concurrent
-/// `purge --catalog` removes the FTS directory after a reader has opened it, the reader
+/// `purge --catalog` removes the FTS folder after a reader has opened it, the reader
/// should gracefully return `NoIndex` rather than propagating an error.
///
/// # Arguments
diff --git a/src/engine/fts/index.rs b/src/engine/fts/index.rs
index 9389c7b..7b64781 100644
--- a/src/engine/fts/index.rs
+++ b/src/engine/fts/index.rs
@@ -6,10 +6,10 @@
//!
//! ## Index layout
//!
-//! Each label has its own Tantivy index directory at:
+//! Each label has its own Tantivy index folder at:
//! `/fts///`
//!
-//! This directory contains:
+//! This folder contains:
//! - `meta.json`: Tantivy's index metadata
//! - Segment files: `*.idx`, `*.store`, `*.term`, `*.pos`, etc.
//! - `manifest.json`: Monodex's staleness manifest (managed by manifest.rs)
@@ -54,7 +54,7 @@ pub enum FtsStaleReason {
pub enum FtsOpenExistingOutcome {
/// Index exists, manifest is valid, and IDs match.
Open(FtsIndex),
- /// No FTS index exists for this label (directory absent or empty).
+ /// No FTS index exists for this label (folder absent or empty).
NoIndex,
/// Index exists but manifest indicates it cannot be queried safely.
Stale { reason: FtsStaleReason },
@@ -66,7 +66,7 @@ pub struct FtsIndex {
pub index: Index,
/// Schema field handles for convenient access.
pub fields: FtsSchemaFields,
- /// Path to the index directory.
+ /// Path to the index folder.
pub path: PathBuf,
}
@@ -82,20 +82,20 @@ impl FtsIndex {
/// Open an existing FTS index or create a new one.
///
/// This method implements a decision tree that handles:
- /// - Missing directory: create new index
- /// - Empty directory: create new index
+ /// - Missing folder: create new index
+ /// - Empty folder: create new index
/// - Existing index: open and validate
/// - Schema/tokenizer mismatch: rebuild from scratch
/// - Corrupted state: error (do not silently rebuild)
///
/// # Arguments
/// * `db_path` - Path to the Monodex database root
- /// * `label_id` - The label identifier (determines index directory path)
+ /// * `label_id` - The label identifier (determines index folder path)
///
/// # Returns
/// An `FtsIndex` wrapper with the index and field handles.
pub fn open_or_create(db_path: &Path, label_id: &LabelId) -> Result {
- let index_dir = fts_index_dir(db_path, label_id);
+ let index_dir = fts_index_folder(db_path, label_id);
// Step 1: Read the manifest first
let manifest_path = index_dir.join("manifest.json");
@@ -107,11 +107,10 @@ impl FtsIndex {
// Handle manifest results that require action before opening Tantivy
match &manifest_result {
ManifestRead::IdMismatch { .. } => {
- // Delete the entire per-label FTS directory and rebuild
+ // Delete the entire per-label FTS folder and rebuild
if index_dir.exists() {
- std::fs::remove_dir_all(&index_dir).map_err(|e| {
- anyhow!("Failed to remove FTS directory for rebuild: {}", e)
- })?;
+ std::fs::remove_dir_all(&index_dir)
+ .map_err(|e| anyhow!("Failed to remove FTS folder for rebuild: {}", e))?;
}
created_or_rebuilt = true;
}
@@ -119,9 +118,8 @@ impl FtsIndex {
// Missing manifest: check if Tantivy state exists
if has_tantivy_state(&index_dir) {
// Manifest missing but Tantivy state exists: rebuild
- std::fs::remove_dir_all(&index_dir).map_err(|e| {
- anyhow!("Failed to remove FTS directory for rebuild: {}", e)
- })?;
+ std::fs::remove_dir_all(&index_dir)
+ .map_err(|e| anyhow!("Failed to remove FTS folder for rebuild: {}", e))?;
created_or_rebuilt = true;
}
// No Tantivy state, treat as fresh create
@@ -143,23 +141,23 @@ impl FtsIndex {
// Step 2: Filesystem state check - open or create the index
let schema = fts_schema();
let index = if !index_dir.exists() {
- // Directory does not exist: create it and initialize a new index
+ // Folder does not exist: create it and initialize a new index
std::fs::create_dir_all(&index_dir)
- .map_err(|e| anyhow!("Failed to create FTS directory: {}", e))?;
+ .map_err(|e| anyhow!("Failed to create FTS folder: {}", e))?;
let directory = MmapDirectory::open(&index_dir)
.map_err(|e| anyhow!("Failed to open MmapDirectory: {}", e))?;
created_or_rebuilt = true;
Index::create(directory, schema.clone(), IndexSettings::default())
.map_err(|e| anyhow!("Failed to create Tantivy index: {}", e))?
} else if !has_tantivy_state(&index_dir) {
- // Directory exists but is empty: initialize a new index
+ // Folder exists but is empty: initialize a new index
let directory = MmapDirectory::open(&index_dir)
.map_err(|e| anyhow!("Failed to open MmapDirectory: {}", e))?;
created_or_rebuilt = true;
Index::create(directory, schema.clone(), IndexSettings::default())
.map_err(|e| anyhow!("Failed to create Tantivy index: {}", e))?
} else {
- // Directory exists and contains Tantivy state: open it
+ // Folder exists and contains Tantivy state: open it
let directory = MmapDirectory::open(&index_dir)
.map_err(|e| anyhow!("Failed to open MmapDirectory: {}", e))?;
Index::open(directory)
@@ -193,7 +191,7 @@ impl FtsIndex {
///
/// Consults the manifest before opening Tantivy to detect stale state.
/// Returns a typed outcome that distinguishes between:
- /// - `NoIndex`: No FTS index exists (directory absent or empty)
+ /// - `NoIndex`: No FTS index exists (folder absent or empty)
/// - `Stale`: Index exists but cannot be queried safely (manifest mismatch)
/// - `Open`: Index exists and is valid
///
@@ -204,9 +202,9 @@ impl FtsIndex {
/// * `db_path` - Path to the Monodex database root
/// * `label_id` - The label identifier
pub fn open_existing(db_path: &Path, label_id: &LabelId) -> Result {
- let index_dir = fts_index_dir(db_path, label_id);
+ let index_dir = fts_index_folder(db_path, label_id);
- // Step 1: Check if directory exists with Tantivy state
+ // Step 1: Check if folder exists with Tantivy state
if !has_tantivy_state(&index_dir) {
return Ok(FtsOpenExistingOutcome::NoIndex);
}
@@ -279,7 +277,7 @@ impl FtsIndex {
/// Get an IndexWriter for document updates.
///
- /// The writer holds a lock on the index directory. Only one writer can exist
+ /// The writer holds a lock on the index folder. Only one writer can exist
/// at a time per index. Under our per-catalog lock discipline, this is
/// guaranteed by the caller.
pub fn writer(&self) -> Result {
@@ -314,31 +312,31 @@ impl FtsIndex {
}
}
-/// Compute the FTS index directory path for a label.
+/// Compute the FTS index folder path for a label.
///
/// The path is: `/fts///`
-pub fn fts_index_dir(db_path: &Path, label_id: &LabelId) -> PathBuf {
+pub fn fts_index_folder(db_path: &Path, label_id: &LabelId) -> PathBuf {
db_path
.join("fts")
.join(label_id.catalog())
.join(label_id.label())
}
-/// Check if a directory contains Tantivy index state.
+/// Check if a folder contains Tantivy index state.
///
/// This is indicated by the presence of `meta.json` or any Tantivy segment files.
-fn has_tantivy_state(dir: &Path) -> bool {
- if !dir.exists() {
+fn has_tantivy_state(folder: &Path) -> bool {
+ if !folder.exists() {
return false;
}
// Check for meta.json
- if dir.join("meta.json").exists() {
+ if folder.join("meta.json").exists() {
return true;
}
// Check for any segment files
- if let Ok(entries) = std::fs::read_dir(dir) {
+ if let Ok(entries) = std::fs::read_dir(folder) {
for entry in entries.flatten() {
let name = entry.file_name();
let name = name.to_string_lossy();
@@ -371,8 +369,8 @@ mod tests {
let _fts_index = FtsIndex::open_or_create(db_path, &label_id).unwrap();
- // Verify directory was created
- let expected_dir = fts_index_dir(db_path, &label_id);
+ // Verify folder was created
+ let expected_dir = fts_index_folder(db_path, &label_id);
assert!(expected_dir.exists());
// Verify meta.json exists (Tantivy creates it)
@@ -404,12 +402,12 @@ mod tests {
}
#[test]
- fn test_fts_index_dir_path() {
+ fn test_fts_index_folder_path() {
let temp_dir = TempDir::new().unwrap();
let db_path = temp_dir.path();
let label_id = make_label_id("my-catalog", "my-label");
- let dir = fts_index_dir(db_path, &label_id);
+ let dir = fts_index_folder(db_path, &label_id);
assert_eq!(dir, db_path.join("fts").join("my-catalog").join("my-label"));
}
@@ -447,7 +445,7 @@ mod tests {
drop(fts_index);
// Write a manifest with mismatched schema ID
- let manifest_path = fts_index_dir(db_path, &label_id).join("manifest.json");
+ let manifest_path = fts_index_folder(db_path, &label_id).join("manifest.json");
let bad_manifest = FtsManifest {
fts_schema_id: "old-schema-id".to_string(),
fts_tokenizer_id: FTS_TOKENIZER_ID.to_string(),
@@ -476,7 +474,7 @@ mod tests {
drop(fts_index);
// Delete the manifest but leave Tantivy state
- let manifest_path = fts_index_dir(db_path, &label_id).join("manifest.json");
+ let manifest_path = fts_index_folder(db_path, &label_id).join("manifest.json");
std::fs::remove_file(&manifest_path).unwrap();
// Open existing should return Stale
@@ -504,7 +502,7 @@ mod tests {
drop(fts_index);
// Corrupt the manifest
- let manifest_path = fts_index_dir(db_path, &label_id).join("manifest.json");
+ let manifest_path = fts_index_folder(db_path, &label_id).join("manifest.json");
std::fs::write(&manifest_path, "not valid json").unwrap();
// Open existing should return Stale
@@ -520,38 +518,38 @@ mod tests {
}
}
- /// Test: open_existing does not treat missing directory as stale
+ /// Test: open_existing does not treat missing folder as stale
#[test]
fn test_open_existing_no_index_for_missing_directory() {
let temp_dir = TempDir::new().unwrap();
let db_path = temp_dir.path();
let label_id = make_label_id("test-catalog", "missing-label");
- // Don't create any FTS directory
+ // Don't create any FTS folder
let result = FtsIndex::open_existing(db_path, &label_id).unwrap();
assert!(
matches!(result, FtsOpenExistingOutcome::NoIndex),
- "Expected NoIndex for missing directory, got {:?}",
+ "Expected NoIndex for missing folder, got {:?}",
result
);
}
- /// Test: open_existing does not treat directory-exists-but-no-Tantivy-state as stale
+ /// Test: open_existing does not treat folder-exists-but-no-Tantivy-state as stale
#[test]
fn test_open_existing_no_index_for_empty_directory() {
let temp_dir = TempDir::new().unwrap();
let db_path = temp_dir.path();
let label_id = make_label_id("test-catalog", "test-label");
- // Create the FTS directory but no Tantivy state
- let index_dir = fts_index_dir(db_path, &label_id);
+ // Create the FTS folder but no Tantivy state
+ let index_dir = fts_index_folder(db_path, &label_id);
std::fs::create_dir_all(&index_dir).unwrap();
let result = FtsIndex::open_existing(db_path, &label_id).unwrap();
assert!(
matches!(result, FtsOpenExistingOutcome::NoIndex),
- "Expected NoIndex for empty directory, got {:?}",
+ "Expected NoIndex for empty folder, got {:?}",
result
);
}
@@ -570,7 +568,7 @@ mod tests {
drop(fts_index);
// Delete the manifest but leave Tantivy state
- let manifest_path = fts_index_dir(db_path, &label_id).join("manifest.json");
+ let manifest_path = fts_index_folder(db_path, &label_id).join("manifest.json");
std::fs::remove_file(&manifest_path).unwrap();
// Open or create should rebuild (delete and recreate)
@@ -603,7 +601,7 @@ mod tests {
drop(fts_index);
// Overwrite manifest with mismatched schema ID
- let manifest_path = fts_index_dir(db_path, &label_id).join("manifest.json");
+ let manifest_path = fts_index_folder(db_path, &label_id).join("manifest.json");
let bad_manifest = serde_json::json!({
"fts_schema_id": "bad-schema-id",
"fts_tokenizer_id": FTS_TOKENIZER_ID
diff --git a/src/engine/fts/indexing.rs b/src/engine/fts/indexing.rs
index 0f6e1d6..f29ce1c 100644
--- a/src/engine/fts/indexing.rs
+++ b/src/engine/fts/indexing.rs
@@ -177,7 +177,7 @@ mod tests {
// Note: Full integration tests require a real LanceDB setup which is complex.
// The core logic is tested via unit tests in the manifest and index modules.
- // End-to-end tests for indexing are in the tests/ directory.
+ // End-to-end tests for indexing are in the tests/ folder.
#[test]
fn test_tokenize_text_produces_tokens() {
diff --git a/src/engine/fts/manifest.rs b/src/engine/fts/manifest.rs
index 1d5fc69..2dbfa14 100644
--- a/src/engine/fts/manifest.rs
+++ b/src/engine/fts/manifest.rs
@@ -111,10 +111,10 @@ pub fn write_manifest(path: &Path, manifest: &FtsManifest) -> Result<()> {
let content = serde_json::to_string_pretty(manifest)
.map_err(|e| anyhow!("Failed to serialize manifest: {}", e))?;
- // Ensure parent directory exists
+ // Ensure parent folder exists
if let Some(parent) = path.parent() {
std::fs::create_dir_all(parent)
- .map_err(|e| anyhow!("Failed to create manifest directory: {}", e))?;
+ .map_err(|e| anyhow!("Failed to create manifest folder: {}", e))?;
}
std::fs::write(path, content).map_err(|e| anyhow!("Failed to write manifest: {}", e))?;
diff --git a/src/engine/fts/mod.rs b/src/engine/fts/mod.rs
index 0acc640..ec5d626 100644
--- a/src/engine/fts/mod.rs
+++ b/src/engine/fts/mod.rs
@@ -2,7 +2,7 @@
//!
//! Purpose: Export FTS submodules and public surface for the rest of the codebase.
//! Edit here when: Adding or removing FTS submodules, or changing the public surface
-//! re-exported from this directory.
+//! re-exported from this folder.
//! Do not edit here for: FTS indexing logic (see `indexing.rs`), tokenizer behavior
//! (see `tokenizer.rs`), schema (see `schema.rs`), search semantics (see `search.rs`),
//! manifest handling (see `manifest.rs`), index management (see `index.rs`),
diff --git a/src/engine/fts/search.rs b/src/engine/fts/search.rs
index bd48da1..ab70625 100644
--- a/src/engine/fts/search.rs
+++ b/src/engine/fts/search.rs
@@ -37,7 +37,7 @@ pub struct FtsHit {
pub enum FtsSearchOutcome {
/// Search ran successfully. Vec may be empty (no matches).
Found(Vec),
- /// FTS directory does not exist for this label.
+ /// FTS folder does not exist for this label.
/// Caller decides whether to warn or silently return empty.
NoIndex,
/// FTS index exists but is stale (manifest mismatch).
diff --git a/src/engine/fts/tests.rs b/src/engine/fts/tests.rs
index 0c77576..71a7e68 100644
--- a/src/engine/fts/tests.rs
+++ b/src/engine/fts/tests.rs
@@ -66,12 +66,12 @@ fn test_chunk_row(
}
}
-/// Create a test database with FTS directory structure.
+/// Create a test database with FTS folder structure.
async fn create_test_db_with_fts(db_path: &Path) -> Database {
use lancedb::connect;
- // Create database directory
- std::fs::create_dir_all(db_path).expect("Failed to create db directory");
+ // Create database folder
+ std::fs::create_dir_all(db_path).expect("Failed to create db folder");
// Create LanceDB tables
let conn = connect(db_path.to_str().unwrap())
@@ -94,8 +94,8 @@ async fn create_test_db_with_fts(db_path: &Path) -> Database {
let meta_file = File::create(db_path.join(META_FILE)).expect("Failed to create meta file");
serde_json::to_writer_pretty(meta_file, &meta).expect("Failed to write meta file");
- // Create FTS directory (normally done by init-db)
- std::fs::create_dir_all(db_path.join("fts")).expect("Failed to create fts directory");
+ // Create FTS folder (normally done by init-db)
+ std::fs::create_dir_all(db_path.join("fts")).expect("Failed to create fts folder");
// Open database (creates LanceDB tables)
Database::open(db_path)
@@ -210,12 +210,12 @@ fn test_manifest_id_mismatch_triggers_rebuild() -> Result<()> {
let db_path = temp_dir.path();
let label_id = make_label_id("test-catalog", "main");
- // Create FTS directory structure
+ // Create FTS folder structure
std::fs::create_dir_all(db_path.join("fts").join("test-catalog").join("main"))?;
// Create a manifest with mismatched IDs
- let manifest_dir = db_path.join("fts").join("test-catalog").join("main");
- let manifest_path = manifest_dir.join("manifest.json");
+ let manifest_folder = db_path.join("fts").join("test-catalog").join("main");
+ let manifest_path = manifest_folder.join("manifest.json");
let bad_manifest = FtsManifest {
fts_schema_id: "old-schema:v1".to_string(),
@@ -470,7 +470,7 @@ fn test_open_existing_returns_none_for_missing() -> Result<()> {
let db_path = temp_dir.path();
let label_id = make_label_id("test-catalog", "missing-label");
- // Don't create any FTS directory
+ // Don't create any FTS folder
use crate::engine::fts::index::FtsOpenExistingOutcome;
let result = FtsIndex::open_existing(db_path, &label_id)?;
diff --git a/src/engine/git_ops/mod.rs b/src/engine/git_ops/mod.rs
index 01739ad..41c8c9b 100644
--- a/src/engine/git_ops/mod.rs
+++ b/src/engine/git_ops/mod.rs
@@ -1,5 +1,5 @@
//! Purpose: Git-aware enumeration and blob reading for crawl sources.
-//! Edit here when: Adding or renaming git_ops submodules, or changing the public surface re-exported from this directory.
+//! Edit here when: Adding or renaming git_ops submodules, or changing the public surface re-exported from this folder.
//! Do not edit here for: the `BlobSource` trait or `FileEntry` (see `blob_source.rs`), package-index lookup and extraction (see `package_index.rs`), gix-based commit traversal (see `commit.rs`), subprocess-based working-tree reading (see `working_dir.rs`).
pub mod commit;
diff --git a/src/engine/git_ops/package_index.rs b/src/engine/git_ops/package_index.rs
index d8821a3..07e736b 100644
--- a/src/engine/git_ops/package_index.rs
+++ b/src/engine/git_ops/package_index.rs
@@ -6,10 +6,10 @@ use serde::Deserialize;
use std::collections::HashMap;
use std::path::Path;
-/// Lookup structure mapping directory paths to package names.
+/// Lookup structure mapping folder paths to package names.
///
/// Built from all package.json files in the source tree, used to resolve
-/// package names for files based on their containing directory.
+/// package names for files based on their containing folder.
pub struct PackageIndex {
package_name_by_dir: HashMap,
}
@@ -21,7 +21,7 @@ impl PackageIndex {
}
}
- /// Find the package name for a file by searching upward from its directory.
+ /// Find the package name for a file by searching upward from its folder.
pub fn find_package_name(&self, relative_path: &str) -> Option<&str> {
let path = Path::new(relative_path);
let mut current = path.parent().unwrap_or(path);
@@ -54,12 +54,12 @@ impl PackageIndex {
None
}
- /// Insert a package name for a directory path.
+ /// Insert a package name for a folder path.
pub(super) fn insert_package_name(&mut self, dir_path: String, name: String) {
self.package_name_by_dir.insert(dir_path, name);
}
- /// Get the package name for a specific directory (exact match, no upward search).
+ /// Get the package name for a specific folder (exact match, no upward search).
#[cfg(test)]
pub(super) fn get_package_name(&self, dir_path: &str) -> Option<&str> {
self.package_name_by_dir.get(dir_path).map(String::as_str)
diff --git a/src/engine/git_ops/tests.rs b/src/engine/git_ops/tests.rs
index af305fe..e25d8ed 100644
--- a/src/engine/git_ops/tests.rs
+++ b/src/engine/git_ops/tests.rs
@@ -122,7 +122,7 @@ fn test_enumerate_working_directory() {
enumerate_working_directory(&repo_path).expect("Failed to enumerate working directory");
assert!(!entries.is_empty(), "Should have found some files");
// README.md should be found (it's a regular file that should be found)
- // Note: Hidden files/directories (dot-prefixed) are skipped during enumeration
+ // Note: Hidden files/folders (dot-prefixed) are skipped during enumeration
assert!(entries.iter().any(|e| e.relative_path == "README.md"));
// All entries should have a 40-character hex blob_id
for entry in &entries {
@@ -147,7 +147,7 @@ fn test_file_id_identical_between_modes() {
use std::fs;
use tempfile::TempDir;
- // Create a temporary directory
+ // Create a temporary folder
let temp_dir = TempDir::new().expect("Failed to create temp dir");
let repo_path = temp_dir.path();
@@ -262,7 +262,7 @@ fn test_working_dir_blob_id_matches_commit() {
);
}
-/// Git-tracked files under hidden directories must be indexed.
+/// Git-tracked files under hidden folders must be indexed.
/// Previously, working-directory crawls skipped files under .github/, .vscode/, etc.
/// even when Git tracked them.
#[test]
@@ -302,7 +302,7 @@ fn test_repo_with_dot_basename_produces_output() {
use std::fs;
use tempfile::TempDir;
- // Create a temporary directory with a dot-prefixed name
+ // Create a temporary folder with a dot-prefixed name
let temp_base = TempDir::new().expect("Failed to create temp dir");
let dot_repo_path = temp_base.path().join(".my-repo");
fs::create_dir_all(&dot_repo_path).expect("Failed to create .my-repo dir");
@@ -459,7 +459,7 @@ fn test_untracked_non_ignored_file_appears_in_working_dir_enumeration() {
use std::fs;
use tempfile::TempDir;
- // Create a temporary directory
+ // Create a temporary folder
let temp_dir = TempDir::new().expect("Failed to create temp dir");
let repo_path = temp_dir.path();
@@ -540,7 +540,7 @@ fn test_untracked_package_json_resolved_in_working_dir_package_index() {
use std::fs;
use tempfile::TempDir;
- // Create a temporary directory
+ // Create a temporary folder
let temp_dir = TempDir::new().expect("Failed to create temp dir");
let repo_path = temp_dir.path();
@@ -587,7 +587,7 @@ fn test_untracked_package_json_resolved_in_working_dir_package_index() {
.expect("Failed to run git commit");
assert!(git_commit.status.success(), "git commit failed");
- // Create an untracked package directory with its own package.json and source file
+ // Create an untracked package folder with its own package.json and source file
let untracked_pkg_dir = repo_path.join("untracked-pkg");
fs::create_dir_all(&untracked_pkg_dir).expect("Failed to create untracked-pkg dir");
fs::write(
diff --git a/src/engine/git_ops/working_dir.rs b/src/engine/git_ops/working_dir.rs
index 80b3d58..74a196d 100644
--- a/src/engine/git_ops/working_dir.rs
+++ b/src/engine/git_ops/working_dir.rs
@@ -324,7 +324,7 @@ pub fn enumerate_working_directory(repo_path: &Path) -> Result> {
/// Build package index from the working directory.
/// Uses the Git-aware blob map to find all `package.json` files in Git's working-tree
-/// view (tracked plus untracked non-ignored), including those under hidden directories
+/// view (tracked plus untracked non-ignored), including those under hidden folders
/// like .github/ or .vscode/.
pub fn build_package_index_for_working_dir(repo_path: &Path) -> Result {
let mut index = PackageIndex::new();
diff --git a/src/engine/storage/chunks/mod.rs b/src/engine/storage/chunks/mod.rs
index 2a9e7bb..01c0974 100644
--- a/src/engine/storage/chunks/mod.rs
+++ b/src/engine/storage/chunks/mod.rs
@@ -3,7 +3,7 @@
//! Purpose: Re-export chunk storage types and operations from the chunks submodule.
//!
//! Edit here when: Adding or removing chunks submodules, or changing the public
-//! surface re-exported from this directory.
+//! surface re-exported from this folder.
//! Do not edit here for: Chunk storage operations (see storage.rs), Arrow encoding
//! and decoding for chunk rows (see arrow_encoding.rs).
diff --git a/src/engine/storage/database.rs b/src/engine/storage/database.rs
index 87f4074..4b3ec12 100644
--- a/src/engine/storage/database.rs
+++ b/src/engine/storage/database.rs
@@ -21,7 +21,7 @@ pub const META_FILE: &str = "monodex-meta.json";
pub struct Database {
/// The LanceDB connection
conn: Connection,
- /// Path to the database root directory
+ /// Path to the database root folder
path: PathBuf,
}
@@ -81,7 +81,7 @@ impl Database {
/// Open an existing monodex database.
///
/// Validates that:
- /// 1. The database directory exists
+ /// 1. The database folder exists
/// 2. `monodex-meta.json` exists and is valid
/// 3. The schema version matches `MONODEX_SCHEMA_VERSION`
///
@@ -124,12 +124,12 @@ impl Database {
);
}
- // Check that table directories exist
+ // Check that table folders exist
let chunks_path = path.join(format!("{}.lance", CHUNKS_TABLE));
let labels_path = path.join(format!("{}.lance", LABEL_METADATA_TABLE));
if !chunks_path.exists() || !labels_path.exists() {
bail!(
- "Database at '{}' is missing table directories. Manual cleanup required.",
+ "Database at '{}' is missing table folders. Manual cleanup required.",
path.display()
);
}
@@ -164,7 +164,7 @@ impl Database {
.map_err(|e| anyhow!("Failed to flush {}: {}", path.display(), e))?
.sync_all()?;
- // Sync the parent directory to ensure the file entry is durable
+ // Sync the parent folder to ensure the file entry is durable
#[cfg(unix)]
{
if let Some(parent) = path.parent() {
diff --git a/src/engine/storage/locks.rs b/src/engine/storage/locks.rs
index 2f663d6..97e96c2 100644
--- a/src/engine/storage/locks.rs
+++ b/src/engine/storage/locks.rs
@@ -118,7 +118,7 @@ fn acquire_file_lock(lockfile_path: &Path, mode: LockMode) -> Result Result {
let lockfile_path = db_path.join("locks").join("commit.lock");
@@ -183,7 +183,7 @@ pub fn acquire_commit_mutex(db_path: &Path) -> Result {
// Internal Helpers
// ============================================================================
-/// Ensures the parent directory for a lockfile exists.
+/// Ensures the parent folder for a lockfile exists.
fn ensure_lock_dir(lockfile_path: &Path) -> Result<()> {
if let Some(parent) = lockfile_path.parent() {
fs::create_dir_all(parent)?;
diff --git a/src/main.rs b/src/main.rs
index 467ba7a..b742df1 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -124,8 +124,8 @@ fn main() -> anyhow::Result<()> {
cli.debug,
)?;
}
- Commands::AuditChunks { count, dir } => {
- monodex::app::commands::run_audit_chunks(count, dir)?;
+ Commands::AuditChunks { count, folder } => {
+ monodex::app::commands::run_audit_chunks(count, folder)?;
}
Commands::DebugFts {
id,
diff --git a/tests/index_lifecycle.rs b/tests/index_lifecycle.rs
index 36a2100..ec28f1b 100644
--- a/tests/index_lifecycle.rs
+++ b/tests/index_lifecycle.rs
@@ -432,8 +432,8 @@ fn test_first_time_crawl_fts_only__quick_excluded() {
// =============================================================================
/// Test purge cleanup:
-/// - After a crawl producing FTS state, purge --catalog X removes FTS directory
-/// - purge --all removes entire fts/ directory
+/// - After a crawl producing FTS state, purge --catalog X removes FTS folder
+/// - purge --all removes entire fts/ folder
#[test]
#[allow(non_snake_case)]
fn test_purge_cleanup__quick_excluded() {
@@ -467,20 +467,20 @@ fn test_purge_cleanup__quick_excluded() {
let db_path = monodex::app::resolve_database_path(&config).unwrap();
let fts_catalog_path = db_path.join("fts").join("test-catalog");
- // Verify FTS directory exists after crawl
+ // Verify FTS folder exists after crawl
assert!(
fts_catalog_path.exists(),
- "FTS catalog directory should exist after crawl"
+ "FTS catalog folder should exist after crawl"
);
// Test purge --catalog
monodex::app::commands::purge::run_purge(&config, Some("test-catalog"), false, false)
.expect("purge --catalog failed");
- // Verify FTS catalog directory is gone after purge --catalog
+ // Verify FTS catalog folder is gone after purge --catalog
assert!(
!fts_catalog_path.exists(),
- "FTS catalog directory should be gone after purge --catalog"
+ "FTS catalog folder should be gone after purge --catalog"
);
// Crawl again to recreate FTS state
@@ -494,21 +494,21 @@ fn test_purge_cleanup__quick_excluded() {
)
.expect("second crawl failed");
- // Verify FTS directory exists again
+ // Verify FTS folder exists again
assert!(
fts_catalog_path.exists(),
- "FTS catalog directory should exist after second crawl"
+ "FTS catalog folder should exist after second crawl"
);
// Test purge --all
monodex::app::commands::purge::run_purge(&config, None, true, false)
.expect("purge --all failed");
- // Verify entire FTS directory exists and is empty after purge --all
+ // Verify entire FTS folder exists and is empty after purge --all
let fts_path = db_path.join("fts");
assert!(
fts_path.exists(),
- "FTS directory should exist after purge --all (implementation recreates it)"
+ "FTS folder should exist after purge --all (implementation recreates it)"
);
let entries: Vec<_> = std::fs::read_dir(&fts_path)
.unwrap()
@@ -516,7 +516,7 @@ fn test_purge_cleanup__quick_excluded() {
.collect();
assert!(
entries.is_empty(),
- "FTS directory should be empty after purge --all"
+ "FTS folder should be empty after purge --all"
);
(monodex_home, repo_dir)
@@ -786,12 +786,12 @@ fn test_first_time_crawl_vector_only__quick_excluded() {
)
.expect("crawl failed");
- // Verify: FTS directory should not exist
+ // Verify: FTS folder should not exist
let db_path = monodex::app::resolve_database_path(&config).unwrap();
- let fts_dir = db_path.join("fts").join("test-catalog").join("main");
+ let fts_folder = db_path.join("fts").join("test-catalog").join("main");
assert!(
- !fts_dir.exists(),
- "FTS directory should not exist after vector-only crawl"
+ !fts_folder.exists(),
+ "FTS folder should not exist after vector-only crawl"
);
// Search with --retrieval vector should succeed
diff --git a/tests/search_output.rs b/tests/search_output.rs
index 6db9cf4..cdafd86 100644
--- a/tests/search_output.rs
+++ b/tests/search_output.rs
@@ -402,7 +402,7 @@ fn test_fts_parse_error_under_hybrid__quick_excluded() {
/// Test that FTS NoIndex under hybrid degrades to vector-only with warning.
/// - Crawl with both methods
-/// - Manually delete the FTS directory
+/// - Manually delete the FTS folder
/// - Search with no flag (hybrid)
/// - Assert: Ok (degraded to vector-only)
#[test]
@@ -434,11 +434,11 @@ fn test_fts_noindex_degradation_under_hybrid__quick_excluded() {
)
.expect("crawl failed");
- // Resolve database path and delete FTS directory
+ // Resolve database path and delete FTS folder
let db_path = monodex::app::resolve_database_path(&config).unwrap();
- let fts_dir = db_path.join("fts").join("test-catalog").join("main");
- if fts_dir.exists() {
- std::fs::remove_dir_all(&fts_dir).expect("Failed to delete FTS directory");
+ let fts_folder = db_path.join("fts").join("test-catalog").join("main");
+ if fts_folder.exists() {
+ std::fs::remove_dir_all(&fts_folder).expect("Failed to delete FTS folder");
}
// Search with no flag (hybrid) - should degrade to vector-only