diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
index 1c1e65e..73896aa 100644
--- a/.github/workflows/ci.yaml
+++ b/.github/workflows/ci.yaml
@@ -55,5 +55,8 @@ jobs:
- name: Clippy
run: cargo clippy --workspace --all-targets --locked -- -D warnings
+ - name: Check facade integrity
+ run: ./scripts/check-facades.sh
+
- name: Test
run: cargo test --workspace --locked
diff --git a/docs/code_organization_policy.md b/docs/code_organization_policy.md
index 19b96c4..1b10af1 100644
--- a/docs/code_organization_policy.md
+++ b/docs/code_organization_policy.md
@@ -19,7 +19,7 @@ A coherent file at 750 lines is better than two incoherent files at 400. The thr
| Types-only file | any | 500 | 800 |
| Test-only file | any | 800 | 1200 |
-Lines = production code excluding `#[cfg(test)]` blocks and the module header. `mod.rs`, `lib.rs`, and `main.rs` files containing only declarations, re-exports, and small dispatch logic are not counted. (Substantive code in `mod.rs` is forbidden per the banned patterns below.)
+Lines = production code excluding `#[cfg(test)]` blocks and the module header. `mod.rs`, `lib.rs`, and `main.rs` files containing only declarations, re-exports, and small dispatch logic are not counted.
Production modules that don't fit a more specific row use the algorithm/engine row.
@@ -125,7 +125,7 @@ Within the integration-test layer itself, prefer fewer tests that exercise reali
### Purpose
-`just ci-quick` is the fast variant of `just ci`. Same fmt and clippy checks, but with the slowest tests filtered out at runtime. It exists for the developer inner loop, the moments between edits when fast feedback matters. Repository CI workflow selection is managed separately; this section only defines the local quick tier and the invariant that `just ci` remains the full gate.
+`just ci-quick` is the fast variant of `just ci`. Same fmt, clippy, and facade checks, but with the slowest tests filtered out at runtime. It exists for the developer inner loop, the moments between edits when fast feedback matters. Repository CI workflow selection is managed separately; this section only defines the local quick tier and the invariant that `just ci` remains the full gate.
### Mechanism
@@ -168,18 +168,26 @@ Counter-examples: a loop variable iterating folders is `current_folder`, not `cu
## Banned patterns
-- No wildcard re-exports (`pub use submodule::*`). List re-exports explicitly.
- No putting unrelated items together just because they're small.
- No structural splits in the same change as feature or fix work. Splits are their own change unless explicitly authorized by the maintainer or the planned reorganization being applied.
-- No substantive code in `mod.rs`. It's the directory's table of contents (module declarations, explicit re-exports, header), not its content. A small number of simple constants that are part of the directory's public surface is fine; move them out when they grow into an implementation vocabulary with its own edit intent.
## Module organization at the directory level
-A directory is an organizational unit: a name that predicts what's inside. The `mod.rs` is the directory's table of contents: module declarations, explicit re-exports, the header. Non-trivial code lives in named sibling files, never in `mod.rs`.
+A directory is an organizational unit: a name that predicts what's inside. The `mod.rs` is the directory's table of contents, holding module declarations, explicit re-exports, and the header; non-trivial code lives in named sibling files. A small number of simple constants that are part of the directory's public surface is fine; move them out when they grow into an implementation vocabulary with its own edit intent.
-**Cross-directory reach.** Prefer the shortest path through a directory's `mod.rs` re-exports. If an item is reachable only by a deep path that skips `mod.rs`, either it belongs in the directory's surface (add a `pub use` in `mod.rs`) or your use site is reaching past the directory's contract and the design should be reconsidered. Don't introduce a new deep-path use site to an item `mod.rs` already re-exports under a shorter name.
+### Facade integrity
-**Visibility keywords.** Items inside files default to `pub`; the directory boundary is what the policy maintains, not item-level discipline. Child `mod` declarations in `mod.rs` are `pub mod` when the child name is part of the intended navigation surface, and plain `mod` when external callers reach items via `mod.rs` re-exports instead. Both are valid: `engine/identifier.rs` is `pub mod` because `identifier` is itself the concept callers reach for; `app/crawl/phases.rs` is `pub mod` because `phases` is an intended decomposition unit, not an implementation detail; `engine/fts/index.rs` is plain `mod` because callers reach its items via `engine::fts::FtsIndex`, already re-exported. An existing `pub mod` is not itself a problem to fix; the trigger for action is a use site that bypasses a shorter path already exposed by `mod.rs`. Narrower keywords (`pub(super)`, `pub(crate)`) are available for sensitive seams but not required.
+A directory's `mod.rs` is its public boundary: what an outside caller can reach is decided by `mod.rs` re-exports, not by the caller writing a deep path. This is a structural rule about boundaries, not a rule about minimizing visibility keywords; the goal is that each directory has exactly one declared surface, so a reader knows where to look and a caller cannot quietly bypass it. The boundary rule has three parts.
+
+**Child modules are declared with plain `mod`.** In a directory `mod.rs`, child modules are declared `mod child;`, never `pub mod child;` or `pub(crate) mod child;`. Items that outside callers need are surfaced by explicit `pub use child::Item;` re-exports in `mod.rs`. The sole exception is a child that must currently be named from outside the library crate, from `tests/` or `main.rs` via a `monodex::...` path naming the child directly. Such a child stays `pub mod`, but only when listed in the facade-check allowlist (see below). An allowlisted `pub mod` is a deliberate, marked exception; plain `mod` is the default.
+
+**Cross-directory reach goes through the facade.** A use site outside `a/b/c/` that reaches `crate::a::b::c::internal` by deep path is a facade violation, whatever visibility keywords are involved. The fix is a re-export in the relevant `mod.rs`, never a wider `mod` declaration. Re-export at the item's own visibility (`pub use` for a `pub` item, `pub(crate) use` for a `pub(crate)` item); do not widen the item to make re-exports uniform. List re-exports explicitly, one item per name; wildcard re-exports (`pub use child::*`) are banned, because they make the facade's surface unreadable. A deep-path reach that looks intentional means the target belongs in the facade's surface, not that the rule should bend.
+
+**Sibling reach stays out of the facade.** An item used only by siblings in the same directory needs no `mod.rs` entry: mark it `pub(super)` and siblings reach it directly.
+
+Two scope clarifications. *Item visibility inside a file* is not governed here. A `pub` item that is externally unreachable carries no information but is not a violation, and need not be demoted. (The `unreachable_pub` lint flags such items. It is deliberately not adopted, because it enforces a stricter, different rule of item-level minimum visibility, and an agent silencing it may add an unwanted re-export rather than improve the boundary.) *Machine check:* `just check-facades` (run by both `just ci` and `just ci-quick`) scans every `src/` `mod.rs` and fails on any `pub mod` / `pub(crate) mod` child not in the allowlist. The check is source-tree-only; `mod.rs` files under `tests/` are not policed, and integration-test fixture facades may use their own local style. The allowlist is the set of children currently named from `tests/` or `main.rs` by a direct `monodex::...` path. That set is presently an accident of what the integration tests reach, and is to be redrawn deliberately once an intended crate API surface is designed.
+
+When the check fails on a new declaration, the allowlist is not the default remedy: re-export the needed item from the facade, or change the caller to a path the facade already exposes. Add to the allowlist only when a caller intentionally needs to name the child module itself and no facade path can serve, and call the addition out in the PR description.
**File layout.** This project uses `
/mod.rs`, not the `.rs` + `/` form Rust 2018 also permits.
diff --git a/justfile b/justfile
index 54a215b..bbee316 100644
--- a/justfile
+++ b/justfile
@@ -1,8 +1,8 @@
# Run all CI checks (format, clippy, all tests)
-ci: fmt-check clippy test
+ci: fmt-check clippy check-facades test
# Run quick CI checks (format, clippy, fast tests only)
-ci-quick: fmt-check clippy test-quick
+ci-quick: fmt-check clippy check-facades test-quick
# Format check
fmt-check:
@@ -31,3 +31,7 @@ build:
# Clean build artifacts
clean:
cargo clean
+
+# Enforce mod.rs facade integrity
+check-facades:
+ ./scripts/check-facades.sh
diff --git a/scripts/check-facades.sh b/scripts/check-facades.sh
new file mode 100755
index 0000000..342f37e
--- /dev/null
+++ b/scripts/check-facades.sh
@@ -0,0 +1,50 @@
+#!/usr/bin/env bash
+# Enforce the facade-integrity rule from docs/code_organization_policy.md:
+# directory mod.rs files declare child modules with plain `mod`, except for
+# an allowlist of children named from outside the library crate.
+set -euo pipefail
+
+# Allowlist: " " pairs. Children named directly by a
+# monodex::... path in tests/ or main.rs. Revisit when the crate API surface
+# is designed deliberately; see code_organization_policy.md.
+allow=$(cat <<'EOF'
+app/mod.rs commands
+app/mod.rs config
+app/commands/mod.rs crawl
+app/commands/mod.rs init_db
+app/commands/mod.rs purge
+app/commands/mod.rs search
+engine/mod.rs fts
+engine/mod.rs identifier
+engine/mod.rs identity
+engine/mod.rs retrieval
+engine/mod.rs schema
+engine/mod.rs storage
+EOF
+)
+
+violations=""
+while IFS= read -r -d '' modfile; do
+ rel=${modfile#src/}
+ while IFS= read -r line; do
+ # Extract the child name from the leading "pub[(...)] mod NAME".
+ # Anchored at the start so a later "mod" in a comment cannot mislead it.
+ decl=$(printf '%s\n' "$line" | sed -E 's/^[0-9]+:[[:space:]]*pub(\([^)]*\))?[[:space:]]+mod[[:space:]]+([A-Za-z_][A-Za-z0-9_]*).*/\2/')
+ if ! printf '%s\n' "$allow" | grep -qxF "$rel $decl"; then
+ violations+="$modfile: $line"$'\n'
+ fi
+ done < <(grep -nE '^[[:space:]]*pub(\([^)]*\))?[[:space:]]+mod[[:space:]]' "$modfile" || true)
+done < <(find src -name mod.rs -print0)
+
+if [[ -n "$violations" ]]; then
+ echo "Facade violation: directory mod.rs files must declare child modules with"
+ echo "plain 'mod', not 'pub mod' or 'pub(...) mod', unless allowlisted in this"
+ echo "script. See the Facade integrity section of docs/code_organization_policy.md."
+ echo
+ printf '%s' "$violations"
+ echo
+ echo "Fix by re-exporting the needed item from the directory mod.rs, or by"
+ echo "changing the caller to a path the facade already exposes. Add to the"
+ echo "allowlist only when a caller must name the child module itself."
+ exit 1
+fi
diff --git a/src/app/commands/audit_chunks.rs b/src/app/commands/audit_chunks.rs
index fb623c9..e48dd9c 100644
--- a/src/app/commands/audit_chunks.rs
+++ b/src/app/commands/audit_chunks.rs
@@ -6,7 +6,7 @@ use anyhow::Result;
use std::path::PathBuf;
use crate::app::number_format::format_count;
-use crate::engine::partitioner::{ChunkQualityReport, PartitionConfig, partition_typescript};
+use crate::engine::{ChunkQualityReport, PartitionConfig, partition_typescript};
pub fn run_audit_chunks(count: usize, folder: String) -> Result<()> {
use rand::seq::IndexedRandom;
diff --git a/src/app/commands/crawl.rs b/src/app/commands/crawl.rs
index f379ece..3de2d5d 100644
--- a/src/app/commands/crawl.rs
+++ b/src/app/commands/crawl.rs
@@ -11,22 +11,18 @@ use std::cell::Cell;
use std::collections::{BTreeSet, HashSet};
use std::sync::Arc;
-use crate::app::crawl::phases::{
+use crate::app::crawl::{
+ ChunkingOutput, CrawlInput, CrawlPreamble, CrawlSourceMetadata, PhaseResults,
add_label_to_existing_files, build_package_index, chunk_new_files, classify_files,
- enumerate_files, filter_files, open_storage, run_fts_phase, run_label_cleanup,
- update_final_metadata, write_in_progress_metadata,
+ create_warning_sink, enumerate_files, filter_files, open_storage, prepare_crawl_preamble,
+ print_narrowing_announcement, print_summary, print_warning_summary, run_fts_phase,
+ run_label_cleanup, update_final_metadata, write_in_progress_metadata,
};
-use crate::app::crawl::preamble::{
- CrawlInput, CrawlPreamble, prepare_crawl_preamble, print_narrowing_announcement,
-};
-use crate::app::crawl::summary::{print_summary, print_warning_summary};
-use crate::app::crawl::types::{CrawlSourceMetadata, PhaseResults};
-use crate::app::crawl::warning::create_warning_sink;
use crate::app::{Config, run_embed_upload_pipeline, run_upsert_without_vectors};
-use crate::engine::git_ops::{BlobSource, CommitBlobSource, WorkingDirBlobSource};
use crate::engine::identifier::LabelId;
use crate::engine::retrieval::RetrievalMethod;
use crate::engine::storage::{SOURCE_KIND_GIT_COMMIT, read_selection};
+use crate::engine::{BlobSource, CommitBlobSource, CompiledCrawlConfig, WorkingDirBlobSource};
/// Report from the post-chunking phases, used by print_summary.
///
@@ -260,7 +256,7 @@ async fn run_crawl_async(
label: &str,
repo_path: &std::path::Path,
label_id: &LabelId,
- crawl_config: &crate::engine::crawl_config::CompiledCrawlConfig,
+ crawl_config: &CompiledCrawlConfig,
db_path: &std::path::Path,
total_start: std::time::Instant,
debug: bool,
@@ -350,7 +346,7 @@ async fn run_crawl_async(
let has_existing_file_failures = !label_add_output.failures.is_empty();
// Destructure chunking_output so warning_files survives the helper call.
- let crate::app::crawl::phases::ChunkingOutput {
+ let ChunkingOutput {
chunks,
touched_file_ids,
warning_files,
diff --git a/src/app/commands/dump_chunks.rs b/src/app/commands/dump_chunks.rs
index 0da06a3..4388b98 100644
--- a/src/app/commands/dump_chunks.rs
+++ b/src/app/commands/dump_chunks.rs
@@ -5,10 +5,9 @@
use std::path::Path;
use crate::app::number_format::format_count;
-use crate::engine::SMALL_CHUNK_CHARS;
-use crate::engine::git_ops::extract_package_name_from_bytes;
-use crate::engine::partitioner::{
- ChunkQualityReport, PartitionConfig, PartitionDebug, partition_typescript,
+use crate::engine::{
+ ChunkQualityReport, PartitionConfig, PartitionDebug, SMALL_CHUNK_CHARS,
+ extract_package_name_from_bytes, partition_typescript,
};
/// Run chunking diagnostics on a TypeScript file
diff --git a/src/app/commands/mod.rs b/src/app/commands/mod.rs
index 743b8a5..4ac3a1c 100644
--- a/src/app/commands/mod.rs
+++ b/src/app/commands/mod.rs
@@ -2,15 +2,15 @@
//! Edit here when: Adding a new command file or modifying command dispatch wiring.
//! Do not edit here for: CLI argument definitions (see `../cli.rs`), individual command logic (see the per-command file).
-pub mod audit_chunks;
+mod audit_chunks;
pub mod crawl;
-pub mod debug_fts;
-pub mod dump_chunks;
+mod debug_fts;
+mod dump_chunks;
pub mod init_db;
pub mod purge;
pub mod search;
-pub mod use_cmd;
-pub mod view;
+mod use_cmd;
+mod view;
#[cfg(test)]
mod test_fixtures;
diff --git a/src/app/commands/search.rs b/src/app/commands/search.rs
index 05431d8..429f842 100644
--- a/src/app/commands/search.rs
+++ b/src/app/commands/search.rs
@@ -10,14 +10,12 @@ use crate::app::{
search::{self, EndMarker, Preamble, SearchRenderModel, SearchWarning, format_source_pointer},
};
use crate::engine::identifier::LabelId;
+use crate::engine::retrieval::format_selection;
use crate::engine::storage::ChunkRow;
+use crate::engine::storage::{Database, ScoredChunkRow};
use crate::engine::{
- ParallelConfig, ParallelEmbedder, RetrievalMethod,
- fts::{FtsSearchOutcome, fts_search},
- fusion::{FusedHit, MethodHit, RankedContribution, fuse},
- retrieval::format_selection,
- search_decision::{Decision, decide},
- storage::{Database, ScoredChunkRow},
+ Decision, DecisionError, FtsSearchOutcome, FusedHit, MethodHit, ParallelConfig,
+ ParallelEmbedder, RankedContribution, RetrievalMethod, decide, fts_search, fuse,
};
use anyhow::anyhow;
use std::collections::{BTreeSet, HashMap};
@@ -247,12 +245,12 @@ pub fn run_search(
/// Format a decision error into a user-facing error message.
fn format_decision_error(
- err: &crate::engine::search_decision::DecisionError,
+ err: &DecisionError,
metadata: &crate::engine::storage::LabelMetadataRow,
label: &str,
debug: bool,
) -> String {
- use crate::engine::search_decision::DecisionError;
+ use DecisionError;
let source_pointer = format_source_pointer(metadata);
match err {
@@ -729,7 +727,7 @@ mod tests {
#[test]
fn test_format_decision_error_empty_selection() {
let metadata = make_test_metadata();
- let err = crate::engine::search_decision::DecisionError::EmptySelection;
+ let err = DecisionError::EmptySelection;
let result = format_decision_error(&err, &metadata, "main", false);
assert_eq!(
result,
@@ -747,9 +745,7 @@ mod tests {
incomplete_methods.insert(RetrievalMethod::Fts);
incomplete_methods.insert(RetrievalMethod::Vector);
- let err = crate::engine::search_decision::DecisionError::AllInSelectionIncomplete {
- incomplete_methods,
- };
+ let err = DecisionError::AllInSelectionIncomplete { incomplete_methods };
let result = format_decision_error(&err, &metadata, "main", false);
// Default form should NOT contain schema details
@@ -774,9 +770,7 @@ mod tests {
incomplete_methods.insert(RetrievalMethod::Fts);
incomplete_methods.insert(RetrievalMethod::Vector);
- let err = crate::engine::search_decision::DecisionError::AllInSelectionIncomplete {
- incomplete_methods,
- };
+ let err = DecisionError::AllInSelectionIncomplete { incomplete_methods };
let result = format_decision_error(&err, &metadata, "main", true);
// Debug form SHOULD contain schema details
@@ -794,7 +788,7 @@ mod tests {
#[test]
fn test_format_decision_error_sources_disagree() {
let metadata = make_test_metadata();
- let err = crate::engine::search_decision::DecisionError::SourcesDisagree {
+ let err = DecisionError::SourcesDisagree {
vector_source: "commit-a".to_string(),
fts_source: "commit-b".to_string(),
};
@@ -816,7 +810,7 @@ mod tests {
let mut methods: BTreeSet = BTreeSet::new();
methods.insert(RetrievalMethod::Fts);
- let err = crate::engine::search_decision::DecisionError::MethodNotInSelection { methods };
+ let err = DecisionError::MethodNotInSelection { methods };
let result = format_decision_error(&err, &metadata, "main", false);
assert!(result.contains(
@@ -837,7 +831,7 @@ mod tests {
methods.insert(RetrievalMethod::Fts);
methods.insert(RetrievalMethod::Vector);
- let err = crate::engine::search_decision::DecisionError::MethodNotInSelection { methods };
+ let err = DecisionError::MethodNotInSelection { methods };
let result = format_decision_error(&err, &metadata, "main", false);
assert!(
diff --git a/src/app/config.rs b/src/app/config.rs
index 1bb2ceb..79b504e 100644
--- a/src/app/config.rs
+++ b/src/app/config.rs
@@ -13,7 +13,7 @@ use std::path::PathBuf;
use anyhow::anyhow;
use crate::engine::identifier::validate_catalog;
-use crate::engine::system_info::{
+use crate::engine::{
ResolvedEmbeddingConfig, compute_auto_embedding_config, estimate_ram_usage, format_bytes,
get_physical_core_count,
};
diff --git a/src/app/crawl/mod.rs b/src/app/crawl/mod.rs
index a2d0eba..0304a12 100644
--- a/src/app/crawl/mod.rs
+++ b/src/app/crawl/mod.rs
@@ -9,13 +9,22 @@
//! progress/time display vocabulary (see `progress_format.rs`),
//! or crawl command handlers (see `../commands/crawl.rs`).
-pub mod phases;
-pub mod pipeline;
-pub(crate) mod preamble;
+mod phases;
+mod pipeline;
+mod preamble;
mod progress_format;
-pub(crate) mod summary;
-pub mod types;
-pub mod warning;
+mod summary;
+mod types;
+mod warning;
+pub use phases::{
+ ChunkingOutput, add_label_to_existing_files, build_package_index, chunk_new_files,
+ classify_files, enumerate_files, filter_files, open_storage, run_fts_phase, run_label_cleanup,
+ update_final_metadata, write_in_progress_metadata,
+};
pub use pipeline::{run_embed_upload_pipeline, run_upsert_without_vectors};
-pub use types::{CrawlFailures, PhaseResults};
+pub use preamble::print_narrowing_announcement;
+pub(crate) use preamble::{CrawlInput, CrawlPreamble, prepare_crawl_preamble};
+pub use summary::{print_summary, print_warning_summary};
+pub use types::{CrawlFailures, CrawlSourceMetadata, PhaseResults};
+pub use warning::create_warning_sink;
diff --git a/src/app/crawl/phases.rs b/src/app/crawl/phases.rs
index 7f3a9b0..75302f4 100644
--- a/src/app/crawl/phases.rs
+++ b/src/app/crawl/phases.rs
@@ -10,15 +10,11 @@ use std::sync::Arc;
use crate::app::crawl::types::PhaseResults;
use crate::app::number_format::format_count;
+use crate::engine::identifier::LabelId;
+use crate::engine::storage::{ChunkStorage, LabelMetadataRow, LabelStorage};
use crate::engine::{
- TARGET_CHARS,
- chunker::{ChunkContext, chunk_content},
- crawl_config::CompiledCrawlConfig,
- git_ops::{BlobSource, FileEntry},
- identifier::LabelId,
- retrieval::RetrievalMethod,
- storage::{ChunkStorage, LabelMetadataRow, LabelStorage},
- warning::{CrawlWarning, WarningSink},
+ BlobSource, ChunkContext, CompiledCrawlConfig, CrawlWarning, FileEntry, PackageIndex,
+ RetrievalMethod, TARGET_CHARS, WarningSink, chunk_content,
};
/// Opens the database and returns storage handles.
@@ -95,9 +91,7 @@ pub fn enumerate_files(blob_source: &dyn BlobSource) -> Result> {
}
/// Builds the package index from the blob source.
-pub fn build_package_index(
- blob_source: &dyn BlobSource,
-) -> Result {
+pub fn build_package_index(blob_source: &dyn BlobSource) -> Result {
println!("📦 Building package index...");
let package_index = blob_source.build_package_index()?;
println!("Package index built successfully");
@@ -316,7 +310,7 @@ pub struct ChunkingOutput {
pub fn chunk_new_files(
new_files: &[FileEntry],
blob_source: &dyn BlobSource,
- package_index: &crate::engine::git_ops::PackageIndex,
+ package_index: &PackageIndex,
crawl_config: &CompiledCrawlConfig,
catalog_name: &str,
label_id: &LabelId,
@@ -715,9 +709,8 @@ mod tests {
/// with error "non-UTF-8 file contents" and be skipped, not crash the crawl.
#[test]
fn test_chunk_new_files_emits_warning_for_non_utf8() {
- use crate::engine::crawl_config::get_default_crawl_config;
- use crate::engine::git_ops::{BlobSource, FileEntry, PackageIndex};
use crate::engine::identifier::LabelId;
+ use crate::engine::{BlobSource, FileEntry, PackageIndex, get_default_crawl_config};
use std::cell::Cell;
use std::path::Path;
diff --git a/src/app/crawl/preamble.rs b/src/app/crawl/preamble.rs
index b63312a..1455982 100644
--- a/src/app/crawl/preamble.rs
+++ b/src/app/crawl/preamble.rs
@@ -12,14 +12,13 @@ use crate::app::config::Config;
use crate::app::crawl::types::CrawlSourceMetadata;
use crate::app::lock_progress::stderr_lock_progress;
use crate::app::{resolve_database_path, validate_config_path};
-use crate::engine::crawl_config::{CompiledCrawlConfig, load_compiled_crawl_config};
-use crate::engine::git_ops::resolve_commit_oid;
use crate::engine::identifier::LabelId;
use crate::engine::retrieval::RetrievalMethod;
use crate::engine::storage::{
CatalogLock, DatabaseLockShared, SOURCE_KIND_GIT_COMMIT, SOURCE_KIND_WORKING_DIRECTORY,
acquire_catalog_lock, acquire_database_shared,
};
+use crate::engine::{CompiledCrawlConfig, load_compiled_crawl_config, resolve_commit_oid};
/// Source discriminator for the crawl preamble.
#[derive(Clone, Copy)]
diff --git a/src/app/crawl/warning.rs b/src/app/crawl/warning.rs
index 78cfaf4..d269fd9 100644
--- a/src/app/crawl/warning.rs
+++ b/src/app/crawl/warning.rs
@@ -4,7 +4,7 @@
//! Edit here when: Changing warning message format, adding new warning renderers.
//! Do not edit here for: Warning type definitions (see `engine/warning.rs`), warning emission in phases (see `phases.rs`).
-use crate::engine::warning::CrawlWarning;
+use crate::engine::CrawlWarning;
use std::cell::Cell;
use std::io::Write;
diff --git a/src/app/mod.rs b/src/app/mod.rs
index 7764167..f80d1d0 100644
--- a/src/app/mod.rs
+++ b/src/app/mod.rs
@@ -4,14 +4,14 @@
mod chunk_display;
mod chunk_selector;
-pub mod cli;
+mod cli;
pub mod commands;
pub mod config;
-pub mod context;
-pub mod crawl;
+mod context;
+mod crawl;
mod lock_progress;
mod number_format;
-pub mod search;
+mod search;
mod terminal_output;
pub use cli::{Cli, Commands, CrawlSourceArgs};
diff --git a/src/app/search.rs b/src/app/search.rs
index 6354862..2ffe936 100644
--- a/src/app/search.rs
+++ b/src/app/search.rs
@@ -16,12 +16,8 @@
use std::io::{self, Write};
-use crate::engine::{
- retrieval::RetrievalMethod,
- storage::ChunkRow,
- warning::DecisionWarning,
- {fusion::FusedHit, storage::LabelMetadataRow},
-};
+use crate::engine::storage::{ChunkRow, LabelMetadataRow};
+use crate::engine::{DecisionWarning, FusedHit, RankedContribution, RetrievalMethod};
use std::collections::HashMap;
@@ -251,7 +247,7 @@ fn render_result_header(
/// Build provenance marker from contributors.
///
/// Returns "f", "v", or "f+v" (alphabetical order).
-fn build_provenance_marker(contributors: &[crate::engine::fusion::RankedContribution]) -> String {
+fn build_provenance_marker(contributors: &[RankedContribution]) -> String {
let has_fts = contributors
.iter()
.any(|c| c.method == RetrievalMethod::Fts);
diff --git a/src/app/search/tests.rs b/src/app/search/tests.rs
index e936b93..1ed6ee9 100644
--- a/src/app/search/tests.rs
+++ b/src/app/search/tests.rs
@@ -3,7 +3,7 @@
//! Do not edit here for: Search output rendering implementation (see `../search.rs`).
use super::*;
-use crate::engine::fusion::{FusedHit, RankedContribution};
+use crate::engine::{FusedHit, RankedContribution};
fn make_fused_hit(row_id: &str, rrf_score: f32, methods: &[RetrievalMethod]) -> FusedHit {
let contributors: Vec = methods
diff --git a/src/engine/breadcrumb.rs b/src/engine/breadcrumb.rs
index 7d9adeb..e52293b 100644
--- a/src/engine/breadcrumb.rs
+++ b/src/engine/breadcrumb.rs
@@ -10,7 +10,7 @@
/// # Example
///
/// ```
-/// use monodex::engine::breadcrumb::encode_path_component;
+/// use monodex::engine::encode_path_component;
///
/// assert_eq!(encode_path_component("weird:file.ts"), "weird%3Afile.ts");
/// assert_eq!(encode_path_component("@scope/pkg"), "%40scope/pkg");
diff --git a/src/engine/fts/mod.rs b/src/engine/fts/mod.rs
index ec5d626..3ad55c9 100644
--- a/src/engine/fts/mod.rs
+++ b/src/engine/fts/mod.rs
@@ -9,12 +9,12 @@
//! error types (see `error.rs`), or vector search (see `engine/storage/chunks/`).
mod error;
-pub mod index;
-pub mod indexing;
-pub mod manifest;
-pub mod schema;
-pub mod search;
-pub mod tokenizer;
+mod index;
+mod indexing;
+mod manifest;
+mod schema;
+mod search;
+mod tokenizer;
#[cfg(test)]
mod tests;
diff --git a/src/engine/git_ops/mod.rs b/src/engine/git_ops/mod.rs
index 41c8c9b..08ffd79 100644
--- a/src/engine/git_ops/mod.rs
+++ b/src/engine/git_ops/mod.rs
@@ -2,8 +2,8 @@
//! Edit here when: Adding or renaming git_ops submodules, or changing the public surface re-exported from this folder.
//! Do not edit here for: the `BlobSource` trait or `FileEntry` (see `blob_source.rs`), package-index lookup and extraction (see `package_index.rs`), gix-based commit traversal (see `commit.rs`), subprocess-based working-tree reading (see `working_dir.rs`).
-pub mod commit;
-pub mod working_dir;
+mod commit;
+mod working_dir;
mod blob_source;
mod package_index;
@@ -13,11 +13,5 @@ mod tests;
pub use blob_source::{BlobSource, CommitBlobSource, FileEntry, WorkingDirBlobSource};
pub use package_index::{PackageIndex, extract_package_name_from_bytes};
-// Re-export public API from submodules
-pub use self::commit::{
- build_package_index_for_commit, enumerate_commit_tree, read_blob_content, resolve_commit_oid,
-};
-pub use self::working_dir::{
- WorkingTreeBlobMap, build_package_index_for_working_dir, build_working_tree_blob_map,
- enumerate_working_directory, read_working_file_content,
-};
+// Re-export items used by app/ (via engine/mod.rs facade)
+pub use self::commit::resolve_commit_oid;
diff --git a/src/engine/git_ops/tests.rs b/src/engine/git_ops/tests.rs
index e25d8ed..e1b5583 100644
--- a/src/engine/git_ops/tests.rs
+++ b/src/engine/git_ops/tests.rs
@@ -2,6 +2,7 @@
//! Edit here when: Adding or modifying tests for commit reading, working-directory enumeration, or the package index.
//! Do not edit here for: Production code changes — edit the relevant submodule (`blob_source.rs`, `package_index.rs`, `commit.rs`, `working_dir.rs`).
+use super::working_dir::{build_package_index_for_working_dir, enumerate_working_directory};
use super::*;
use std::path::PathBuf;
use std::process::Command;
diff --git a/src/engine/mod.rs b/src/engine/mod.rs
index f6da3be..7ede6ca 100644
--- a/src/engine/mod.rs
+++ b/src/engine/mod.rs
@@ -2,34 +2,50 @@
//! Edit here when: Adding a new top-level engine submodule or convenience re-export.
//! Do not edit here for: App-level concerns such as CLI, config, commands (see `app/`); details inside individual engine submodules.
-pub mod breadcrumb;
-pub mod chunker;
-pub mod crawl_config;
+mod breadcrumb;
+mod chunker;
+mod crawl_config;
pub mod fts;
-pub mod fusion;
-pub mod git_ops;
+mod fusion;
+mod git_ops;
pub mod identifier;
pub mod identity;
-pub mod markdown_partitioner;
-pub mod parallel_embedder;
-pub mod partitioner;
+mod markdown_partitioner;
+mod parallel_embedder;
+mod partitioner;
pub mod retrieval;
pub mod schema;
-pub mod search_decision;
+mod search_decision;
pub mod storage;
-pub mod system_info;
-pub mod warning;
-pub mod working_dir_sentinel;
+mod system_info;
+mod warning;
+mod working_dir_sentinel;
// Re-export commonly used types for convenience
-pub use chunker::Chunk;
-pub use crawl_config::ChunkingStrategy;
+pub use breadcrumb::encode_path_component;
+pub use chunker::{Chunk, ChunkContext, chunk_content};
+pub use crawl_config::{
+ ChunkingStrategy, CompiledCrawlConfig, get_default_crawl_config, load_compiled_crawl_config,
+};
pub use fts::{
FtsHit, FtsIndex, FtsIndexingStats, FtsManifest, FtsSearchOutcome, fts_search,
index_chunks_for_fts,
};
-pub use parallel_embedder::ParallelConfig;
-pub use parallel_embedder::ParallelEmbedder;
-pub use partitioner::{SMALL_CHUNK_CHARS, TARGET_CHARS};
+pub use fusion::{FusedHit, MethodHit, RankedContribution, fuse};
+pub use git_ops::{
+ BlobSource, CommitBlobSource, FileEntry, PackageIndex, WorkingDirBlobSource,
+ extract_package_name_from_bytes, resolve_commit_oid,
+};
+pub use parallel_embedder::{ParallelConfig, ParallelEmbedder};
+pub use partitioner::{
+ ChunkQualityReport, PartitionConfig, PartitionDebug, SMALL_CHUNK_CHARS, TARGET_CHARS,
+ partition_typescript,
+};
pub use retrieval::RetrievalMethod;
+pub use search_decision::{Decision, DecisionError, decide};
+pub use system_info::{
+ ResolvedEmbeddingConfig, compute_auto_embedding_config, estimate_ram_usage, format_bytes,
+ get_physical_core_count,
+};
+pub use warning::{CrawlWarning, DecisionWarning, WarningSink};
pub use working_dir_sentinel::make_working_dir_source_sentinel;
diff --git a/src/engine/partitioner/mod.rs b/src/engine/partitioner/mod.rs
index a49f88b..8d5de76 100644
--- a/src/engine/partitioner/mod.rs
+++ b/src/engine/partitioner/mod.rs
@@ -58,7 +58,7 @@ mod types;
pub use debug::PartitionDebug;
pub use partition::partition_typescript;
-pub use scoring::{ChunkQualityReport, chunk_quality_score};
+pub use scoring::ChunkQualityReport;
pub use types::{
PartitionConfig, PartitionError, PartitionedChunk, SMALL_CHUNK_CHARS, TARGET_CHARS,
};