Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,8 @@ jobs:
- name: Clippy
run: cargo clippy --workspace --all-targets --locked -- -D warnings

- name: Check facade integrity
run: ./scripts/check-facades.sh

- name: Test
run: cargo test --workspace --locked
22 changes: 15 additions & 7 deletions docs/code_organization_policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ A coherent file at 750 lines is better than two incoherent files at 400. The thr
| Types-only file | any | 500 | 800 |
| Test-only file | any | 800 | 1200 |

Lines = production code excluding `#[cfg(test)]` blocks and the module header. `mod.rs`, `lib.rs`, and `main.rs` files containing only declarations, re-exports, and small dispatch logic are not counted. (Substantive code in `mod.rs` is forbidden per the banned patterns below.)
Lines = production code excluding `#[cfg(test)]` blocks and the module header. `mod.rs`, `lib.rs`, and `main.rs` files containing only declarations, re-exports, and small dispatch logic are not counted.

Production modules that don't fit a more specific row use the algorithm/engine row.

Expand Down Expand Up @@ -125,7 +125,7 @@ Within the integration-test layer itself, prefer fewer tests that exercise reali

### Purpose

`just ci-quick` is the fast variant of `just ci`. Same fmt and clippy checks, but with the slowest tests filtered out at runtime. It exists for the developer inner loop, the moments between edits when fast feedback matters. Repository CI workflow selection is managed separately; this section only defines the local quick tier and the invariant that `just ci` remains the full gate.
`just ci-quick` is the fast variant of `just ci`. Same fmt, clippy, and facade checks, but with the slowest tests filtered out at runtime. It exists for the developer inner loop, the moments between edits when fast feedback matters. Repository CI workflow selection is managed separately; this section only defines the local quick tier and the invariant that `just ci` remains the full gate.

### Mechanism

Expand Down Expand Up @@ -168,18 +168,26 @@ Counter-examples: a loop variable iterating folders is `current_folder`, not `cu

## Banned patterns

- No wildcard re-exports (`pub use submodule::*`). List re-exports explicitly.
- No putting unrelated items together just because they're small.
- No structural splits in the same change as feature or fix work. Splits are their own change unless explicitly authorized by the maintainer or the planned reorganization being applied.
- No substantive code in `mod.rs`. It's the directory's table of contents (module declarations, explicit re-exports, header), not its content. A small number of simple constants that are part of the directory's public surface is fine; move them out when they grow into an implementation vocabulary with its own edit intent.

## Module organization at the directory level

A directory is an organizational unit: a name that predicts what's inside. The `mod.rs` is the directory's table of contents: module declarations, explicit re-exports, the header. Non-trivial code lives in named sibling files, never in `mod.rs`.
A directory is an organizational unit: a name that predicts what's inside. The `mod.rs` is the directory's table of contents, holding module declarations, explicit re-exports, and the header; non-trivial code lives in named sibling files. A small number of simple constants that are part of the directory's public surface is fine; move them out when they grow into an implementation vocabulary with its own edit intent.

**Cross-directory reach.** Prefer the shortest path through a directory's `mod.rs` re-exports. If an item is reachable only by a deep path that skips `mod.rs`, either it belongs in the directory's surface (add a `pub use` in `mod.rs`) or your use site is reaching past the directory's contract and the design should be reconsidered. Don't introduce a new deep-path use site to an item `mod.rs` already re-exports under a shorter name.
### Facade integrity

**Visibility keywords.** Items inside files default to `pub`; the directory boundary is what the policy maintains, not item-level discipline. Child `mod` declarations in `mod.rs` are `pub mod` when the child name is part of the intended navigation surface, and plain `mod` when external callers reach items via `mod.rs` re-exports instead. Both are valid: `engine/identifier.rs` is `pub mod` because `identifier` is itself the concept callers reach for; `app/crawl/phases.rs` is `pub mod` because `phases` is an intended decomposition unit, not an implementation detail; `engine/fts/index.rs` is plain `mod` because callers reach its items via `engine::fts::FtsIndex`, already re-exported. An existing `pub mod` is not itself a problem to fix; the trigger for action is a use site that bypasses a shorter path already exposed by `mod.rs`. Narrower keywords (`pub(super)`, `pub(crate)`) are available for sensitive seams but not required.
A directory's `mod.rs` is its public boundary: what an outside caller can reach is decided by `mod.rs` re-exports, not by the caller writing a deep path. This is a structural rule about boundaries, not a rule about minimizing visibility keywords; the goal is that each directory has exactly one declared surface, so a reader knows where to look and a caller cannot quietly bypass it. The boundary rule has three parts.

**Child modules are declared with plain `mod`.** In a directory `mod.rs`, child modules are declared `mod child;`, never `pub mod child;` or `pub(crate) mod child;`. Items that outside callers need are surfaced by explicit `pub use child::Item;` re-exports in `mod.rs`. The sole exception is a child that must currently be named from outside the library crate, from `tests/` or `main.rs` via a `monodex::...` path naming the child directly. Such a child stays `pub mod`, but only when listed in the facade-check allowlist (see below). An allowlisted `pub mod` is a deliberate, marked exception; plain `mod` is the default.

**Cross-directory reach goes through the facade.** A use site outside `a/b/c/` that reaches `crate::a::b::c::internal` by deep path is a facade violation, whatever visibility keywords are involved. The fix is a re-export in the relevant `mod.rs`, never a wider `mod` declaration. Re-export at the item's own visibility (`pub use` for a `pub` item, `pub(crate) use` for a `pub(crate)` item); do not widen the item to make re-exports uniform. List re-exports explicitly, one item per name; wildcard re-exports (`pub use child::*`) are banned, because they make the facade's surface unreadable. A deep-path reach that looks intentional means the target belongs in the facade's surface, not that the rule should bend.

**Sibling reach stays out of the facade.** An item used only by siblings in the same directory needs no `mod.rs` entry: mark it `pub(super)` and siblings reach it directly.

Two scope clarifications. *Item visibility inside a file* is not governed here. A `pub` item that is externally unreachable carries no information but is not a violation, and need not be demoted. (The `unreachable_pub` lint flags such items. It is deliberately not adopted, because it enforces a stricter, different rule of item-level minimum visibility, and an agent silencing it may add an unwanted re-export rather than improve the boundary.) *Machine check:* `just check-facades` (run by both `just ci` and `just ci-quick`) scans every `src/` `mod.rs` and fails on any `pub mod` / `pub(crate) mod` child not in the allowlist. The check is source-tree-only; `mod.rs` files under `tests/` are not policed, and integration-test fixture facades may use their own local style. The allowlist is the set of children currently named from `tests/` or `main.rs` by a direct `monodex::...` path. That set is presently an accident of what the integration tests reach, and is to be redrawn deliberately once an intended crate API surface is designed.

When the check fails on a new declaration, the allowlist is not the default remedy: re-export the needed item from the facade, or change the caller to a path the facade already exposes. Add to the allowlist only when a caller intentionally needs to name the child module itself and no facade path can serve, and call the addition out in the PR description.

**File layout.** This project uses `<dir>/mod.rs`, not the `<dir>.rs` + `<dir>/` form Rust 2018 also permits.

Expand Down
8 changes: 6 additions & 2 deletions justfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Run all CI checks (format, clippy, all tests)
ci: fmt-check clippy test
ci: fmt-check clippy check-facades test

# Run quick CI checks (format, clippy, fast tests only)
ci-quick: fmt-check clippy test-quick
ci-quick: fmt-check clippy check-facades test-quick

# Format check
fmt-check:
Expand Down Expand Up @@ -31,3 +31,7 @@ build:
# Clean build artifacts
clean:
cargo clean

# Enforce mod.rs facade integrity
check-facades:
./scripts/check-facades.sh
50 changes: 50 additions & 0 deletions scripts/check-facades.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/usr/bin/env bash
# Enforce the facade-integrity rule from docs/code_organization_policy.md:
# directory mod.rs files declare child modules with plain `mod`, except for
# an allowlist of children named from outside the library crate.
set -euo pipefail

# Allowlist: "<path-under-src> <child>" pairs. Children named directly by a
# monodex::... path in tests/ or main.rs. Revisit when the crate API surface
# is designed deliberately; see code_organization_policy.md.
allow=$(cat <<'EOF'
app/mod.rs commands
app/mod.rs config
app/commands/mod.rs crawl
app/commands/mod.rs init_db
app/commands/mod.rs purge
app/commands/mod.rs search
engine/mod.rs fts
engine/mod.rs identifier
engine/mod.rs identity
engine/mod.rs retrieval
engine/mod.rs schema
engine/mod.rs storage
EOF
)

violations=""
while IFS= read -r -d '' modfile; do
rel=${modfile#src/}
while IFS= read -r line; do
# Extract the child name from the leading "pub[(...)] mod NAME".
# Anchored at the start so a later "mod" in a comment cannot mislead it.
decl=$(printf '%s\n' "$line" | sed -E 's/^[0-9]+:[[:space:]]*pub(\([^)]*\))?[[:space:]]+mod[[:space:]]+([A-Za-z_][A-Za-z0-9_]*).*/\2/')
if ! printf '%s\n' "$allow" | grep -qxF "$rel $decl"; then
violations+="$modfile: $line"$'\n'
fi
done < <(grep -nE '^[[:space:]]*pub(\([^)]*\))?[[:space:]]+mod[[:space:]]' "$modfile" || true)
done < <(find src -name mod.rs -print0)

if [[ -n "$violations" ]]; then
echo "Facade violation: directory mod.rs files must declare child modules with"
echo "plain 'mod', not 'pub mod' or 'pub(...) mod', unless allowlisted in this"
echo "script. See the Facade integrity section of docs/code_organization_policy.md."
echo
printf '%s' "$violations"
echo
echo "Fix by re-exporting the needed item from the directory mod.rs, or by"
echo "changing the caller to a path the facade already exposes. Add to the"
echo "allowlist only when a caller must name the child module itself."
exit 1
fi
2 changes: 1 addition & 1 deletion src/app/commands/audit_chunks.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ use anyhow::Result;
use std::path::PathBuf;

use crate::app::number_format::format_count;
use crate::engine::partitioner::{ChunkQualityReport, PartitionConfig, partition_typescript};
use crate::engine::{ChunkQualityReport, PartitionConfig, partition_typescript};

pub fn run_audit_chunks(count: usize, folder: String) -> Result<()> {
use rand::seq::IndexedRandom;
Expand Down
20 changes: 8 additions & 12 deletions src/app/commands/crawl.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,22 +11,18 @@ use std::cell::Cell;
use std::collections::{BTreeSet, HashSet};
use std::sync::Arc;

use crate::app::crawl::phases::{
use crate::app::crawl::{
ChunkingOutput, CrawlInput, CrawlPreamble, CrawlSourceMetadata, PhaseResults,
add_label_to_existing_files, build_package_index, chunk_new_files, classify_files,
enumerate_files, filter_files, open_storage, run_fts_phase, run_label_cleanup,
update_final_metadata, write_in_progress_metadata,
create_warning_sink, enumerate_files, filter_files, open_storage, prepare_crawl_preamble,
print_narrowing_announcement, print_summary, print_warning_summary, run_fts_phase,
run_label_cleanup, update_final_metadata, write_in_progress_metadata,
};
use crate::app::crawl::preamble::{
CrawlInput, CrawlPreamble, prepare_crawl_preamble, print_narrowing_announcement,
};
use crate::app::crawl::summary::{print_summary, print_warning_summary};
use crate::app::crawl::types::{CrawlSourceMetadata, PhaseResults};
use crate::app::crawl::warning::create_warning_sink;
use crate::app::{Config, run_embed_upload_pipeline, run_upsert_without_vectors};
use crate::engine::git_ops::{BlobSource, CommitBlobSource, WorkingDirBlobSource};
use crate::engine::identifier::LabelId;
use crate::engine::retrieval::RetrievalMethod;
use crate::engine::storage::{SOURCE_KIND_GIT_COMMIT, read_selection};
use crate::engine::{BlobSource, CommitBlobSource, CompiledCrawlConfig, WorkingDirBlobSource};

/// Report from the post-chunking phases, used by print_summary.
///
Expand Down Expand Up @@ -260,7 +256,7 @@ async fn run_crawl_async(
label: &str,
repo_path: &std::path::Path,
label_id: &LabelId,
crawl_config: &crate::engine::crawl_config::CompiledCrawlConfig,
crawl_config: &CompiledCrawlConfig,
db_path: &std::path::Path,
total_start: std::time::Instant,
debug: bool,
Expand Down Expand Up @@ -350,7 +346,7 @@ async fn run_crawl_async(
let has_existing_file_failures = !label_add_output.failures.is_empty();

// Destructure chunking_output so warning_files survives the helper call.
let crate::app::crawl::phases::ChunkingOutput {
let ChunkingOutput {
chunks,
touched_file_ids,
warning_files,
Expand Down
7 changes: 3 additions & 4 deletions src/app/commands/dump_chunks.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,9 @@
use std::path::Path;

use crate::app::number_format::format_count;
use crate::engine::SMALL_CHUNK_CHARS;
use crate::engine::git_ops::extract_package_name_from_bytes;
use crate::engine::partitioner::{
ChunkQualityReport, PartitionConfig, PartitionDebug, partition_typescript,
use crate::engine::{
ChunkQualityReport, PartitionConfig, PartitionDebug, SMALL_CHUNK_CHARS,
extract_package_name_from_bytes, partition_typescript,
};

/// Run chunking diagnostics on a TypeScript file
Expand Down
10 changes: 5 additions & 5 deletions src/app/commands/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@
//! Edit here when: Adding a new command file or modifying command dispatch wiring.
//! Do not edit here for: CLI argument definitions (see `../cli.rs`), individual command logic (see the per-command file).

pub mod audit_chunks;
mod audit_chunks;
pub mod crawl;
pub mod debug_fts;
pub mod dump_chunks;
mod debug_fts;
mod dump_chunks;
pub mod init_db;
pub mod purge;
pub mod search;
pub mod use_cmd;
pub mod view;
mod use_cmd;
mod view;

#[cfg(test)]
mod test_fixtures;
Expand Down
30 changes: 12 additions & 18 deletions src/app/commands/search.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,12 @@ use crate::app::{
search::{self, EndMarker, Preamble, SearchRenderModel, SearchWarning, format_source_pointer},
};
use crate::engine::identifier::LabelId;
use crate::engine::retrieval::format_selection;
use crate::engine::storage::ChunkRow;
use crate::engine::storage::{Database, ScoredChunkRow};
use crate::engine::{
ParallelConfig, ParallelEmbedder, RetrievalMethod,
fts::{FtsSearchOutcome, fts_search},
fusion::{FusedHit, MethodHit, RankedContribution, fuse},
retrieval::format_selection,
search_decision::{Decision, decide},
storage::{Database, ScoredChunkRow},
Decision, DecisionError, FtsSearchOutcome, FusedHit, MethodHit, ParallelConfig,
ParallelEmbedder, RankedContribution, RetrievalMethod, decide, fts_search, fuse,
};
use anyhow::anyhow;
use std::collections::{BTreeSet, HashMap};
Expand Down Expand Up @@ -247,12 +245,12 @@ pub fn run_search<W: Write>(

/// Format a decision error into a user-facing error message.
fn format_decision_error(
err: &crate::engine::search_decision::DecisionError,
err: &DecisionError,
metadata: &crate::engine::storage::LabelMetadataRow,
label: &str,
debug: bool,
) -> String {
use crate::engine::search_decision::DecisionError;
use DecisionError;
let source_pointer = format_source_pointer(metadata);

match err {
Expand Down Expand Up @@ -729,7 +727,7 @@ mod tests {
#[test]
fn test_format_decision_error_empty_selection() {
let metadata = make_test_metadata();
let err = crate::engine::search_decision::DecisionError::EmptySelection;
let err = DecisionError::EmptySelection;
let result = format_decision_error(&err, &metadata, "main", false);
assert_eq!(
result,
Expand All @@ -747,9 +745,7 @@ mod tests {
incomplete_methods.insert(RetrievalMethod::Fts);
incomplete_methods.insert(RetrievalMethod::Vector);

let err = crate::engine::search_decision::DecisionError::AllInSelectionIncomplete {
incomplete_methods,
};
let err = DecisionError::AllInSelectionIncomplete { incomplete_methods };
let result = format_decision_error(&err, &metadata, "main", false);

// Default form should NOT contain schema details
Expand All @@ -774,9 +770,7 @@ mod tests {
incomplete_methods.insert(RetrievalMethod::Fts);
incomplete_methods.insert(RetrievalMethod::Vector);

let err = crate::engine::search_decision::DecisionError::AllInSelectionIncomplete {
incomplete_methods,
};
let err = DecisionError::AllInSelectionIncomplete { incomplete_methods };
let result = format_decision_error(&err, &metadata, "main", true);

// Debug form SHOULD contain schema details
Expand All @@ -794,7 +788,7 @@ mod tests {
#[test]
fn test_format_decision_error_sources_disagree() {
let metadata = make_test_metadata();
let err = crate::engine::search_decision::DecisionError::SourcesDisagree {
let err = DecisionError::SourcesDisagree {
vector_source: "commit-a".to_string(),
fts_source: "commit-b".to_string(),
};
Expand All @@ -816,7 +810,7 @@ mod tests {
let mut methods: BTreeSet<RetrievalMethod> = BTreeSet::new();
methods.insert(RetrievalMethod::Fts);

let err = crate::engine::search_decision::DecisionError::MethodNotInSelection { methods };
let err = DecisionError::MethodNotInSelection { methods };
let result = format_decision_error(&err, &metadata, "main", false);

assert!(result.contains(
Expand All @@ -837,7 +831,7 @@ mod tests {
methods.insert(RetrievalMethod::Fts);
methods.insert(RetrievalMethod::Vector);

let err = crate::engine::search_decision::DecisionError::MethodNotInSelection { methods };
let err = DecisionError::MethodNotInSelection { methods };
let result = format_decision_error(&err, &metadata, "main", false);

assert!(
Expand Down
2 changes: 1 addition & 1 deletion src/app/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ use std::path::PathBuf;
use anyhow::anyhow;

use crate::engine::identifier::validate_catalog;
use crate::engine::system_info::{
use crate::engine::{
ResolvedEmbeddingConfig, compute_auto_embedding_config, estimate_ram_usage, format_bytes,
get_physical_core_count,
};
Expand Down
23 changes: 16 additions & 7 deletions src/app/crawl/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,22 @@
//! progress/time display vocabulary (see `progress_format.rs`),
//! or crawl command handlers (see `../commands/crawl.rs`).

pub mod phases;
pub mod pipeline;
pub(crate) mod preamble;
mod phases;
mod pipeline;
mod preamble;
mod progress_format;
pub(crate) mod summary;
pub mod types;
pub mod warning;
mod summary;
mod types;
mod warning;

pub use phases::{
ChunkingOutput, add_label_to_existing_files, build_package_index, chunk_new_files,
classify_files, enumerate_files, filter_files, open_storage, run_fts_phase, run_label_cleanup,
update_final_metadata, write_in_progress_metadata,
};
pub use pipeline::{run_embed_upload_pipeline, run_upsert_without_vectors};
pub use types::{CrawlFailures, PhaseResults};
pub use preamble::print_narrowing_announcement;
pub(crate) use preamble::{CrawlInput, CrawlPreamble, prepare_crawl_preamble};
pub use summary::{print_summary, print_warning_summary};
pub use types::{CrawlFailures, CrawlSourceMetadata, PhaseResults};
pub use warning::create_warning_sink;
Loading
Loading