- You are Claude Code. Actions that would be time consuming for a human — writing tests, building features, refactoring code — are fast and comparatively cheap for you.
- Conversation history gets compacted once the context window reaches its limit. Important details from earlier in the conversation — including plans, discoveries, and decisions — may be lost. Proactively write important information to files so it persists beyond context compression.
- Confirm before implementing: After writing a plan but before starting implementation, always present the plan to the user and ask if they have any questions or concerns. Do not begin coding until the user confirms.
- A "long-running test" is any test that takes >= 60 seconds. Many tests in this project run for 10–30 minutes (especially OSS pipeline benchmarks and integration tests). Never wait blindly for completion.
- Run long tests in the background using
run_in_background: true. Then poll the output at regular intervals (every 30–60 seconds) usingTaskOutputwithblock: falseto inspect incremental stderr/stdout. - Act on partial output: errors, warnings, and progress lines appear long before the test finishes. If you see repeated failures (e.g., wrong paths, IPC errors, assertion failures), stop the test early, diagnose, and fix — don't wait for the full run to complete.
- After applying a fix, delete any cached DB files under
target/test-repos/*.dbbefore re-running the test so stale results don't mask the fix.
- Verify before deleting: Before deleting any files or folders, always verify they are not referenced elsewhere in the codebase using grep or other search tools. Never assume a file is unused.
- Verify assumptions: Before acting on any assumption about the codebase (API signatures, available methods, file locations, type constraints, etc.), read the relevant source. Use grep, glob, or file reads to confirm. Do not assume — check.
- Verify with builds and tests: After making changes, build the affected project and run existing tests to confirm nothing is broken. When the correct behaviour of a piece of logic is non-obvious, write a test to verify it — including temporary/throwaway tests if that is the fastest way to confirm an assumption. Remove temporary tests once they have served their purpose.
- Do not add comments that merely describe the changes made (e.g., "Modified this to fix bug X").
- Comments should be reserved for explaining the code and functionality themselves (the "how" and "why" of the logic), adhering to standard clean code practices.
- Use clear, descriptive names for all variables.
- Avoid obscure abbreviations (e.g., use
isCollectioninstead ofisColl).
- Write plans to a file before implementing: For non-trivial tasks, write the plan to a markdown file in the repo before starting implementation. Delete when done.
- Stop and reassess after repeated failures: If consecutive fix attempts fail to resolve an issue, stop and reconsider the approach rather than continuing to apply further fixes.
- Commits should be focused and well-delimited: Each commit should represent one coherent,
self-contained piece of work (e.g. a bug fix, a single new feature, a refactor, a docs update).
Do not bundle unrelated changes into a single commit. Compare DIFFs to do so. When a file
contains changes that belong in separate commits, use
git add -pto stage specific hunks rather than editing the file, committing, and re-applying changes. Do not add any Claude attribution or co-author lines to commit messages. - Keep documentation up to date: After implementing features, fixing bugs, or adding tests,
update the following files to reflect the changes:
CLAUDE.md— architecture decisions, key conventions, layout tree, test counts.MEMORY.md— current implementation state, test counts, new conventions, gotchas, and non-obvious decisions. Update at the end of every session.README.md— user-facing project description and usage instructions.TESTS_IMPLEMENTATION_PLAN.md— add new test entries with ✅ status, promote ⬜→✅ for newly implemented tests, and update phase-coverage summary counts. Every test row must have a unique T-NNN id.TEST_COVERAGE.md— do not edit by hand. After any change to TESTS_IMPLEMENTATION_PLAN.md, runpwsh .\scripts\Sync-TestCoverage.ps1from the repo root to regenerate it.
-
IMPLEMENTATION_PLAN.md is the full implementation plan for the Code Agent Platform — a local-first indexing and retrieval engine for C# and TypeScript/React codebases, exposed as an MCP server. It covers the complete architecture (SQLite graph schema, concurrency model, ingest pipeline, retrieval pipeline, and MCP tool surface) and defines a six-phase build roadmap spanning syntactic indexing (Phase 1), semantic enrichment via Roslyn/TS Language Service (Phase 2a), rename detection (Phase 2b), embeddings (Phase 3), hybrid retrieval and eval (Phase 4), MCP server (Phase 5), and hardening & observability (Phase 6), along with testing strategy and key risks.
-
INVARIANTS_CHECKLIST.md is a rigorous specification of correctness invariants for the Code Agent Platform — it defines the rules the system must never violate around symbol identity stability, invalidation, storage schema, and MCP tool behavior, along with a phased test assertion checklist to verify compliance.
-
TESTS_IMPLEMENTATION_PLAN.md is the single source of truth for all test entries. Every test row has a unique T-NNN id in the first column. It lists implemented (✅), planned (⬜), and deferred (🔁) tests for every phase (Phases 1–6), organised by module. Phase 5 covers MCP server tests. Update status from ⬜ to ✅ when a planned test is implemented, and update the phase-coverage summary table counts. Also the place to add newly-planned tests. Do not use
|characters in description text — the pipe is the Markdown table delimiter and breaks the sync script. -
TEST_COVERAGE.md is auto-generated from TESTS_IMPLEMENTATION_PLAN.md by
scripts/Sync-TestCoverage.ps1. Run that script after any change to TESTS_IMPLEMENTATION_PLAN.md (adding, removing, or promoting a test entry) rather than editing TEST_COVERAGE.md by hand. The "Coverage Gaps" section at the bottom is preserved verbatim across regenerations. -
scripts/Sync-TestCoverage.ps1— PowerShell script that (1) assigns stable T-NNN ids to every test row in TESTS_IMPLEMENTATION_PLAN.md and (2) regenerates TEST_COVERAGE.md from the ✅ rows. Run withpwsh .\scripts\Sync-TestCoverage.ps1from the repo root; use-DryRunto preview without writing. Idempotent: existing ids are never renumbered. -
MEMORY.md is a living scratchpad that records the current implementation state, key conventions, known bugs (and their fixes), schema version history, and per-phase architecture notes. Keep it up to date at the end of every session — update the test count, add any new conventions discovered, and record any gotchas or non-obvious decisions so future sessions don't repeat the same mistakes.
-
MCP_SERVER_SPEC.md defines the MCP server tool surface — the universal API through which all clients (LLM agents, apps, IDE extensions) interact with the index engine. It specifies file system tools (list_directory, read_file, get_directory_tree), search and discovery tools, inspection and navigation tools, and engine management tools. The MCP server replaces the previously planned RLM orchestration layer; orchestration is now the client's responsibility.
-
The LMGenie app is provided as a demo app for testing.
-
LMGenie.Desktop contains the Tauri app. The CodeAgent React component will later make use of the 'indexing engine' we're implementing.
The Rust workspace lives at codeagent-engine/ with two crates:
crates/codeagent-core— library crate with all indexing logiccrates/codeagent-cli—codeagentbinary for debug inspection
Key architectural decisions already made and encoded in code:
- Single-writer SQLite: one dedicated OS thread owns an exclusive
rusqlite::Connection; all writes go through a boundedtokio::sync::mpscchannel (depth 10). Readers use anr2d2connection pool. WAL mode. Never add a second writer connection. - BLOB(16) for UUIDs, BLOB(32) for hashes:
NodeId,FileId,ProjectIdareUuidwrappers;ContentHashis[u8; 32]. These types are enforced in every SQL query — never store them as TEXT. - No
is_deletedflag: deletion is always hard-delete, journaled indeletion_logfirst. The 4-step file deletion order (delete node_spans→ process nodes →delete file node LAST) is an invariant — seegraph/deletion.rs. symbol_disambiguator NOT NULL DEFAULT '': never NULL, prevents SQLite UNIQUE index bypass. The identity key is(language, project_id, symbol_key, symbol_disambiguator).- Partial unique index for primary spans:
CREATE UNIQUE INDEX idx_node_spans_one_primary ON node_spans(node_id) WHERE is_primary = 1. Never try toUPDATE … SET is_primary = 1 … LIMIT 1; bundled SQLite lacksSQLITE_ENABLE_UPDATE_DELETE_LIMIT. Usespan_idto target a specific row instead. - Self-referential root project:
PRAGMA defer_foreign_keys = ONinside the transaction — seegraph/nodes.rs::ensure_root_project(). Required because the synthetic root node hasproject_id = node_id. MAX_NODES_PER_TX = 500: callers must chunk before callingupsert_nodes_in_tx().- Cancellation before COMMIT: the writer thread checks
CancellationTokenbefore every COMMIT and rolls back on cancel — never bypass this. - ChangeBatch ordering: creates/modifications are always processed before deletes. This is how file moves preserve node identity.
- Phase 1 TypeScript symbol_key includes
file_id: all TS exported symbols areExportScope::Module(conservative). Thefile_idisSHA-256(path)[0..16]— deterministic, not random. Phase 2a may promote toExportScope::Package. - Phase 1 C# symbol_key:
qualified_name:kind:param_count(param_types)<generic_arity>— overload-safe without Roslyn. LIMIT 1in SELECT is fine; it's only banned in UPDATE/DELETE (SQLite compile flag issue).- FTS5 tokenizer:
unicode61 tokenchars '_.$:@'withprefix = '2 3 4'. Columns:node_id UNINDEXED, name, qualified_name, parameter_signature, return_type. Synced manually viasync_fts_insert()/sync_fts_delete()ingraph/nodes.rs— not a content table. upsert_node()does NOT sync FTS automatically:insert_node()/update_node()write only to thenodestable. Callers (adapters) must callsync_fts_insert()separately after everyupsert_node(). Tests that bypass adapters and callupsert_node()directly must also callsync_fts_insert()explicitly if they intend to query FTS.- FTS5 MATCH with JOIN must use table name, not alias: in a query that aliases
fts_nodesasfts, the MATCH clause must beWHERE fts_nodes MATCH ?1— NOTWHERE fts MATCH ?1. Using the alias produces a runtime error"no such column: <alias>". Column access via the alias (fts.rank,fts.node_id) is fine. - ORT embedding:
ort 2.0.0-rc.11withfeatures = ["load-dynamic", "ndarray"].Session::run()requires&mut self, soEmbeddingModelwraps it inArc<Mutex<Session>>. Imports areort::session::Sessionandort::session::builder::GraphOptimizationLevel— these are NOT re-exported at theortcrate root. InvalidationPlanner: all invalidation decisions go throughingest/invalidation.rs::plan()— matches the §6.6 decision matrix exactly. Never scatter invalidation logic elsewhere.row_to_node()column order: the 29-column SELECT order inquery/mod.rsis documented in the function's doc comment. Any new query that returns node rows must use the same column order.- Multi-span
chunk_hash: SHA-256 of allspan_hashvalues sorted by bytes ASC — not insertion order. Seecompute_multi_span_chunk_hash().
Phase 2a: Semantic enrichment via language service child processes.
Phase 2b: Rename/move detection via fingerprinting.
494 tests pass across all workspace crates (417 core unit + 41 fixture + 31 MCP + 5 CLI; 2 ignored for Windows symlinks). 27 additional tests in the Rust extractor binary (outside the workspace). OSS integration tests are feature-gated. CURRENT_SCHEMA_VERSION is now 5.
Key architectural decisions added in Phase 2:
- IPC: newline-delimited JSON-RPC 2.0 over stdin/stdout: Language service processes (C# Roslyn, TS LS) communicate with the Rust core via
tokio::process::Commandwith piped stdio. Each line is one complete JSON message. Never use HTTP or named pipes. writer.submit()only returnsResult<()>: the writer channel is typed to(). To return data from a write operation, use anArc<std::sync::Mutex<T>>passed into the closure (seedetect_renames()inpipeline.rs). Alternatively, write first and read back viareader_pool.read()(seerun_project_detection()).reader_pool.read()is async:DbReaderPool::read()takes aFnOnce(&Connection) -> Result<T>closure and runs it in aspawn_blockingtask. Never callreader_pool.get()— that method does not exist onDbReaderPool. Always usereader_pool.read(|conn| { ... }).await.IpcManagerfield onIngestPipeline: created innew()viaIpcManager::new(config, repo_root). Routesanalyze_file()calls to the C# or TS child process based on language; falls back tosyntactic_onlyon any error. Auto-respawns onIpcChildExited.- Safe mode:
indexing.safe_mode = trueblocks MSBuild evaluation entirely (C# falls back tosyntactic_only). TS Language Service stripspluginsfromtsconfig.jsoncompiler options unconditionally before creating aLanguageServiceinstance. - Project detection runs first (Step 2 in pipeline):
detect_projects()walks the repo for.csprojandpackage.jsonfiles;ensure_projects_in_db()upserts project nodes. This must complete before symbol indexing soproject_idis correct. Only re-runs when project files appear in the batch or no real projects exist yet. - Identity reconciliation: when Roslyn/TS LS provides a better
symbol_key, try in-placeUPDATE. OnUNIQUEconstraint violation (another node already owns the key), fall back to delete-old + insert-new + journal innode_identity_map. Seegraph/identity.rs. - Atomic semantic edge replace:
replace_semantic_edges_in_tx()deletes allconfidence='exact'edges for a set of source node IDs, then inserts new ones — both in a single transaction. Never mix stale and fresh edges. InvalidationAction::RecomputeSemanticEdges: returned byInvalidationPlannerforSemanticContextChanged. The pipeline acts on it in Step 7 by callingSemanticContextRecomputer::recompute_project(). The function takesrepo_root: &Pathto resolve repo-relative file paths to absolute paths for disk reads.- Rename detection is 3-tier (Phase 2b):
- Git (
git diff --find-renames --name-status HEAD) — parsesR<score>\t<old>\t<new>lines. - Fingerprint: reads
chunk_fingerprintfromdeletion_log, computes fingerprint of new file, Jaccard similarity ≥rename_similarity_threshold(default 0.80). - Symbol-level: same container + kind + arity match in
deletion_log→ insertsnode_identity_mapentry. Rename detection runs as Step 3 (before deletes). Paths confirmed as renames are skipped in the delete pass.
- Git (
- Token winnowing fingerprint: FNV-1a 64-bit hash of identifier-normalised k-grams (k=4, window=4). Identifiers replaced with the token
IDENT; structural tokens and keywords preserved. Results stored as LE-encodedu64bytes.jaccard_similarity()runs in O(n+m) via sorted-set merge. node_identity_maptable (Migration 002): tracks oldsymbol_key→newnode_idmappings for identity reconciliation and symbol-level renames. Indexed on both old and new key for fast lookup.normalize_path_lossy()in project detection: project detection usesnormalize_path_lossy()(returnsString, falls back to raw path on error) notnormalize_path()(returnsResult<String>), because project scanning should never hard-fail on an individual path.- Solution-level prebuild (Step 2.5 in pipeline):
run_solution_prebuild()detects the best .sln (or generates a synthetic one), runsdotnet restore(NOTdotnet build— MSBuildWorkspace only needs NuGet packages, not compiled assemblies), then loads the workspace in the C# extractor viaload_csharp_solutionIPC call. Config:solution_restore_timeout_ms(default 600s, set to 0 to disable). Sub-phase timings tracked inSolutionSubTimings(scan, generate, restore, load). - Solution-based batch analysis skips
WithDocumentText():AnalyzeSingleFromSolution()uses documents directly from the loaded solution (solution.GetDocument(docId)) instead of forking a new Solution snapshot per file. This allows all parallel workers to share Roslyn's internal semantic model cache. For initial indexing the files haven't changed sinceOpenSolutionAsync()loaded them. - Batch-prefetched semantic enrichment:
prefetch_enrichment_lookups()bulk-loads node locations from DB before the write pass.apply_enrichment_batch_prefetched()uses the pre-fetched HashMap for identity reconciliation and edge replacement, avoiding per-file DB lookups. - Minimal solution reload for incremental batches: after the initial index, IPC pools are shut down to free memory. On the next large incremental batch,
load_minimal_csharp_solution()generates a synthetic solution containing only the touched projects (not the full repo), restores and loads it. This preserves cross-project type resolution without re-loading the entire workspace. The C# group merge inbuild_semantic_groups()checksis_csharp_solution_loaded()(actual IPC process state) before merging — not just the pipeline-levelsolution_loadcache. - PRAGMA safety in bulk writes:
process_parallel()relaxes PRAGMAs (synchronous=OFF,foreign_keys=OFF) for bulk writes. The write result is captured without early return, PRAGMAs are unconditionally restored, then the error is propagated. This prevents the writer connection from being left in an unsafe state on failure/cancellation. - Deterministic project file selection:
find_project_file_for()sortsread_dirresults before picking, ensuring consistent behavior across platforms when multiple.csprojfiles share a directory.
Phase A: Tree-sitter syntactic indexing for .rs files.
Phase B: Cargo workspace detection (Cargo.toml scanning, workspace member glob expansion).
Phase C: Core-side wiring (IPC pools, pipeline semantic groups, config) + LSP adapter extractor binary.
494 workspace tests + 27 extractor tests. All existing C#/TS tests unaffected.
Key architectural decisions for Rust support:
Language::Rustvariant: added to theLanguageenum intypes.rs.detect_language()maps.rsextension. Semantic context files:Cargo.toml,Cargo.lock,rust-toolchain.toml.- Rust symbol_key format:
crate::module::container::name:kind. Module path derived from file path relative tosrc/(e.g.,src/foo/bar.rs→crate::foo::bar). Inlinemod name { ... }blocks push onto the module path. Kind suffixes::struct,:enum,:trait,:fn(free function),:method(impl/trait method),:mod,:const,:static,:field,:variant. - Impl block symbol keys: inherent
impl Config→ methods keyed ascrate::mod::Config::method:method. Traitimpl Display for Config→ methods keyed ascrate::mod::Config.Display::fmt:method. RustAdapter(tree-sitter, Phase A): followsLanguageAdaptertrait. Supportsindex_file(),index_file_fresh(), andextract_file()(parallel path).parse_status = syntactic_onlyfor all nodes until semantic enrichment via the extractor.- Cargo workspace detection (Phase B):
detect_rust_projects()inproject_detection.rsscans forCargo.tomlfiles. Parses[workspace]sections, expandsmembersglobs (e.g.,"crates/*"), emits oneDetectedProjectper[package]. Virtual workspaces (no[package]) emit only member projects. Deterministic via sortedread_dir. - IPC pool for Rust (Phase C):
IpcManagerowns arust_poolwith memory-aware concurrency (LanguageMemoryProfile::RUST: 150 MB/process, 200 MB spawn gate). Routes viaanalyze_rust()method. Batch dispatch supported. rust_extractor_pathconfig:IndexingConfig.rust_extractor_path: Option<PathBuf>. When absent, Rust semantic enrichment is unavailable (tree-sitter only).- Pipeline integration (Phase C):
build_semantic_groups()collects Rust groups alongside C#/TS. Groups processed in order: TS → C# → Rust. Rust IPC pool shut down before write phase to free memory. - Rust rename detection:
RUST_KEYWORDSinfingerprint.rs— Rust keywords and structural tokens preserved during token normalization for fingerprint-based rename detection. reconcile_rust_symbol_key()inidentity.rs: guards against stale data by checking old key matches. Attempts in-place UPDATE, falls back to delete+create +node_identity_mapentry on UNIQUE conflict.- Rust extractor binary: standalone
codeagent-rust-extractorcrate (outside the Cargo workspace). Architecture:codeagent-core ←JSON-RPC→ codeagent-rust-extractor ←LSP (Content-Length)→ rust-analyzer. Useslsp-serverfor message framing andlsp-typesfor protocol definitions. Spawnsrust-analyzer --stdioon first analysis request. Falls back tosyntactic_onlyif rust-analyzer is not found on PATH.RUST_ANALYZER_PATHenv var overrides PATH lookup.
Extractor binaries (inside codeagent-engine/, outside the Rust workspace):
extractors/csharp/— .NET 8 console app (CodeAgentExtractor.csproj). UsesMicrosoft.CodeAnalysis.Workspaces.MSBuild+Microsoft.Build.Locator. Entry point:src/Program.cs(JSON-RPC loop). Extractor logic:src/RoslynExtractor.cs. Protocol types:src/Protocol.cs. Launched by Rust asdotnet <dll>or a native AOT binary.extractors/typescript/— Node.js app (package.json, TypeScript 5.4). Entry point:src/index.ts(JSON-RPC loop). Extractor logic:src/extractor.ts(ProjectContextwith lazy tsconfig load, plugin stripping). Protocol types:src/protocol.ts. Launched by Rust asnode --max-old-space-size=2048 dist/index.js.extractors/rust/— Standalone Rust binary (codeagent-rust-extractor). LSP adapter: spawnsrust-analyzer --stdio, translatesdocumentSymbolresponses intoSemanticNode/SemanticEdge. Entry point:src/main.rs(JSON-RPC loop). LSP client:src/lsp_client.rs(Content-Length framing vialsp-server). Analyzer:src/analyzer.rs. Launched by Rust core via the path inconfig.indexing.rust_extractor_path.
codeagent-engine/ layout (cumulative):
codeagent-engine/
Cargo.toml — workspace root
crates/
codeagent-core/ — library crate (all indexing logic)
src/
lib.rs — crate root
error.rs — CoreError enum (+ Ipc, IpcVersionMismatch, IpcTimeout, IpcChildExited, IdentityConflict)
types.rs — NodeId, FileId, ProjectId, Language, NodeType, EdgeType, …
config.rs — Config (+ IndexingConfig, EmbeddingConfig, RetrievalConfig, McpConfig, LoggingConfig)
path.rs — normalize_path(), normalize_path_lossy(), detect_language(), is_generated()
db/
schema.rs — MIGRATION_001..005 DDL (CURRENT_SCHEMA_VERSION = 5)
migrations.rs — run_migrations(), quick_check(), integrity_check()
connection.rs — DbWriterHandle (mpsc), DbReaderPool (r2d2 + async read()), start_writer_thread()
graph/
nodes.rs — upsert_node(), ensure_root_project(), compute_span_hash(), sync_fts_*()
edges.rs — upsert_edge(), replace_semantic_edges_in_tx(), get_outgoing_edges(), get_incoming_edges()
spans.rs — insert_span(), replace_spans(), reassign_primary_span()
deletion.rs — delete_file_transactional(), hard_delete_node(), journal_node()
identity.rs — reconcile_csharp_symbol_key(), upgrade_ts_export_scope(), reconcile_rust_symbol_key()
api_endpoints.rs — upsert_api_endpoint(), search_api_endpoints(), ApiEndpoint, ApiStyle
ranking.rs — compute_pagerank(), get_pagerank_summary(), PageRankOptions
vectors.rs — colbert_search(), centroid_prefilter(), maxsim_score(), insert_or_replace_embedding()
ipc/
mod.rs — re-exports IpcManager
protocol.rs — RpcRequest/Response, HandshakeParams/Result, AnalyzeFileParams/Result, SemanticNode/Edge
process.rs — LanguageServiceProcess (spawn, watchdog, codec, bounded semaphore)
manager.rs — IpcManager (routes to C#/TS/Rust process, respawns on crash, safe_mode enforcement, is_csharp_solution_loaded())
pool_sizing.rs — IPC concurrency pool sizing heuristics
ingest/
batch.rs — ChangeBatch, FileChange (creates-before-deletes ordering)
invalidation.rs — InvalidationPlanner, ChangeType, InvalidationAction (+ RecomputeSemanticEdges)
project_detection.rs — detect_projects(), ensure_projects_in_db(), resolve_project_id()
semantic.rs — enrich_file(), prefetch_enrichment_lookups(), apply_enrichment_batch_prefetched(), SemanticContextRecomputer::recompute_project()
extraction.rs — FileExtraction, IdentityKey, IdentityMap, write_file_extraction(), write_file_extraction_initial()
fingerprint.rs — compute_fingerprint(), jaccard_similarity(), fingerprint_to/from_bytes()
rename.rs — RenameDetector::detect() (3-tier: git → fingerprint → symbol)
watcher.rs — start_watcher(), run_coalescer() (debounce + burst recovery)
pipeline.rs — IngestPipeline::process_batch() (7-step async pipeline), process_parallel(), load_minimal_csharp_solution(), SolutionSubTimings, PipelineTimings
adapters/
mod.rs — LanguageAdapter trait, TREESITTER_EXTRACTOR_VERSION
csharp.rs — CSharpAdapter (tree-sitter; handles namespace/class/interface/method/…)
typescript.rs — TypeScriptAdapter (.ts and .tsx; React component detection)
rust.rs — RustAdapter (tree-sitter; module path from file position, impl block handling)
embedding/
onnx.rs — EmbeddingModel, DocumentEmbedding, QueryEmbedding, build_embedding_text()
provider.rs — EmbeddingProvider trait, HashEmbeddingProvider (deterministic test provider)
query/
mod.rs — get_node(), get_neighbors(), get_source(), get_outline(), filter_nodes()
dead_code.rs — find_dead_code(), count_dead_code(), DeadCodeOptions, DeadCodeEntry
retrieval/
mod.rs — retrieve() (hybrid search entry point)
types.rs — QueryIntent, RetrievalQuery, ScoredNode, RetrievalResult, AssembledContext
channels.rs — vector_search(), bm25_search(), qualified_name_search()
reranker.rs — merge_and_rerank() (RRF fusion)
context.rs — assemble_context() (token-budget packing)
intent.rs — classify_intent() (query → intent classification)
eval/
mod.rs — run_eval(), EvalResult, ArchetypeMetrics
metrics.rs — ndcg_at_k(), mrr(), precision_at_k(), recall_at_k()
dataset.rs — load_eval_dataset(), QueryArchetype, EvalQuery
test_support.rs — shared test helpers
codeagent-cli/ — codeagent binary (init, hooks, debug inspection)
src/
main.rs — CLI entry point (clap)
init.rs — `codeagent init` (config, .gitignore, Claude Code hooks)
hooks.rs — hook handlers (pre-compact, post-tool-use, subagent-start, task-completed)
codeagent-mcp/ — MCP server binary (codeagent-mcp)
src/
main.rs — MCP server entry point (stdio transport)
state.rs — ServerState (DB handles, config, repo root)
sandbox.rs — path sandboxing to repo root
serialization.rs — MCP JSON serialization helpers
error.rs — MCP error types
test_helpers.rs — MCP test helpers
tools/
mod.rs — tool registration and dispatch
filesystem.rs — list_directory, read_file, get_directory_tree
search.rs — search_symbols, lookup_symbol, find_similar
navigation.rs — get_symbol, get_source_spans, get_file_outline, get_callers, etc.
management.rs — index_files, get_status
extractors/
csharp/ — .NET 8 Roslyn extractor (CodeAgentExtractor.csproj)
src/Protocol.cs — JSON-RPC 2.0 types
src/RoslynExtractor.cs — RoslynProjectContext (lazy MSBuildWorkspace), RoslynExtractor (solution batch analysis, shared semantic cache)
src/Program.cs — stdin/stdout JSON-RPC main loop
typescript/ — Node.js TS Language Service extractor
src/protocol.ts — JSON-RPC 2.0 types
src/extractor.ts — ProjectContext (lazy tsconfig load, plugin stripping), TsExtractor
src/index.ts — stdin readline JSON-RPC main loop
rust/ — Rust semantic extractor (LSP adapter wrapping rust-analyzer)
Cargo.toml — standalone binary crate (lsp-types 0.97, lsp-server 0.7)
src/main.rs — stdin/stdout JSON-RPC main loop (handshake, analyze_file, analyze_files_batch)
src/protocol.rs — JSON-RPC 2.0 types (mirrors codeagent-core/src/ipc/protocol.rs)
src/lsp_client.rs — LspClient (spawns rust-analyzer --stdio, Content-Length framing, request/response matching)
src/analyzer.rs — RustAnalyzer (LSP adapter: documentSymbol → SemanticNode/SemanticEdge, impl block parsing)