memory scan is a repository bootstrap command.
Its job is to inspect an existing codebase, ask an LLM for durable project knowledge, validate the result, and then write that knowledge into Memory Layer through the normal capture and curate pipeline.
This is not a generic "summarize my repo" command. It is specifically trying to extract project memory that should still be useful later.
scan now sits on top of a local repository index. You can build that index explicitly with memory repo index and inspect it with memory repo status.
- What It Does
- What It Reads
- Repository Index
- What It Reads From Git
- What It Sends To The LLM
- What It Accepts From The LLM
- How It Writes Memory
- Idempotency
- Dry Run Mode
- Scan Reports
- Configuration Requirements
- Current Limits And Defaults
- What
scanIs Good At - What
scanIs Not Good At - Practical Workflow
- Troubleshooting
At a high level, scan does this:
- load the current repo and project context
- read a curated subset of repository files
- read a bounded amount of recent git history
- build a structured dossier from that material
- send the dossier to an OpenAI-compatible chat model
- require strict JSON back
- validate and deduplicate the returned candidates
- write them as a normal Memory Layer capture
- run curation so the resulting memories become searchable
So scan is really:
- local repository indexing
- repository sampling
- LLM extraction
- strict validation
- normal Memory Layer ingestion
It does not bypass the existing backend or write directly to PostgreSQL tables.
Before scan asks the LLM anything, it now works from a local repository index stored under:
.mem/runtime/index/
You can manage that index directly:
memory repo index --project my-project
memory repo status --project my-projectThe current index includes:
- the selected repository files used for scan
- the selected git commits used for scan
- the current
HEAD - a simple language-coverage summary
- parser-backed analyzer summaries for enabled languages
- extracted symbol, import, reference, call, and test-link facts
- evidence-bundle counts for debugging and future scan quality work
Analyzer enablement comes from .agents/memory-layer.toml:
[analysis]
analyzers = ["rust", "typescript", "python"]When you run memory scan, it reuses the existing index if it still matches the current repo HEAD and the same --since window. Otherwise it rebuilds the index first.
If you want to force a fresh index before scanning, use:
memory scan --project my-project --rebuild-indexscan does not read every file in the repository.
It chooses a bounded set of high-value files using a scoring heuristic.
The current implementation prefers:
README*- files under
docs/ - top-level manifests such as
Cargo.toml,package.json,pyproject.toml,go.mod - main Rust entrypoints like
crates/*/src/main.rsandcrates/*/src/lib.rs - files under
src/ - files under
scripts/ - files under
packaging/ - files under
.agents/skills/ - common config and service files such as
.toml,.md,.yaml,.yml,.json,.sh,.service
It skips obvious low-value or noisy paths such as:
.git/target/.mem/node_modules/
Implementation limits:
- up to
18repository files - up to
8_000bytes per file after normalization - file content budget is roughly
70%of the configured LLM input budget
scan also reads recent git history because important architecture and workflow knowledge often lives in commit history rather than only in current files.
The current implementation:
- reads up to
20non-merge commits - captures commit hash
- captures commit timestamp
- captures subject
- captures trimmed body
- captures up to
12changed paths per commit
You can bound this with:
memory scan --since "2 weeks ago"or:
memory scan --since "2026-03-01"The CLI builds a structured dossier with:
- project slug
- canonical repo root
- current
HEADcommit if available - selected file contents
- selected git commits
It then sends that dossier to the configured OpenAI-compatible chat endpoint with a system prompt that tells the model to:
- extract durable repository memory
- return strict JSON
- keep candidates concise and repo-specific
- avoid speculative claims
- avoid transient task notes
- attach provenance through files and/or commits
The requested JSON shape is:
summarycandidates[]
Each candidate is expected to include:
canonical_textsummarymemory_typeconfidenceimportancetagsprovenance_filesprovenance_commitsrationale
The LLM output is not trusted blindly.
The current validation step rejects candidates when:
canonical_textis emptysummaryis empty- both file provenance and commit provenance are missing
- the candidate is a duplicate of an earlier candidate in the same scan
It also normalizes:
- candidate text
- tags
- confidence
- importance
The current hard cap is:
- at most
12accepted candidates per scan
If validation produces zero acceptable candidates, scan fails instead of writing low-quality memory.
Accepted candidates are converted into a normal CaptureTaskRequest.
That request contains:
task_title = "Repository scan for <project>"- a scan-specific
user_prompt - the LLM-generated summary as
agent_summary - the selected repo files as
files_changed - a condensed git summary in
git_diff_summary - the validated candidates as
structured_candidates
Then scan does exactly what a normal high-level write should do:
- call
memory capture task - call
curate
That means scan output goes through the same:
- backend validation
- provenance rules
- curation rules
- search chunk generation
- activity streaming
scan generates an idempotency key so rerunning it on the same repo state does not create uncontrolled duplicate raw captures.
The key is currently based on:
- prompt version
- project slug
- current
HEAD - selected file paths and contents
- selected commit hashes
This means:
- rerunning an unchanged scan tends to collapse to the same raw capture
- changing important files or commits produces a new scan capture
Use this first if you want to inspect what scan is going to do:
memory scan --project my-project --dry-runIn dry-run mode, scan still:
- builds or reuses the repository index
- calls the LLM
- validates candidates
- prints a concise preview of the accepted candidate memories
But it does not:
- write a scan report file
- emit a scan activity event
- create a capture
- run curation
- write project memory
Non-dry-run scans write a local report under:
.mem/runtime/scan/
The report includes:
- prompt version
- project
- whether it was a dry run
- whether the repository index was reused or rebuilt
- the repository index path
- summary
- how many files were considered
- how many commits were considered
- language coverage
- the dossier that was sent
- the accepted candidates
This is useful for debugging why a scan produced the memory it did.
scan requires working LLM configuration.
Today that means:
[llm].provider = "openai_compatible"[llm].base_url[llm].model[llm].api_key_env- the API key available in:
- process environment
.mem/memory-layer.env- shared
memory-layer.env
If these are not present, scan fails before doing any repository work.
Important implementation details:
- only
openai_compatibleproviders are supported today - the request goes to
POST /chat/completions response_formatis forced to JSON objecttemperatureis sent first, then omitted on retry if the model rejects itmax_completion_tokenscomes from[llm].max_output_tokens
Current fixed limits:
MAX_FILES = 18MAX_COMMITS = 20MAX_FILE_BYTES = 8_000MAX_CANDIDATES = 12
These are implementation limits, not user-facing flags.
scan works best for extracting:
- architecture facts
- major functionality
- durable conventions
- setup and environment facts
- repo-specific workflow knowledge
It is especially useful when onboarding Memory Layer to an existing project that already has a lot of knowledge spread across README files, docs, and git history.
scan is not currently a full semantic repository model.
It does not:
- read every file
- execute code
- run tests
- infer runtime behavior from actual execution
- inspect issue trackers or external systems
- build the code graph yet
- guarantee that every accepted candidate is correct just because the model produced it
It is only as good as:
- the selected file set
- the selected git history
- the configured model
- the validation and curation pipeline
Recommended usage:
- initialize the repo with
memory wizardormemory init - make sure
[llm]is configured - run a dry run first
- inspect the generated report
- run the real scan
- open the TUI and inspect the resulting memories
Example:
memory scan --project my-project --dry-run
memory scan --project my-project
memory tui --project my-projectCommon failure cases:
- missing
[llm].model - missing API key
- wrong model name
- unsupported model parameters
- no valid durable candidates returned
Useful checks:
memory doctor
memory scan --project my-project --dry-runIf you are debugging a specific scan result, the most useful artifact is the JSON report in .mem/runtime/scan/.