Spectral Cortex is a compact Rust implementation of a Spectral Memory Graph (SMG) designed to be used as a short-term and long-term memory store for AI agents that reason over a project's git history. It converts commit messages and other short text chunks into embeddings, builds a spectral graph of semantic relationships, clusters related content, and exposes a retrieval API tuned for agent workflows.
This README is targeted at developers who want a local, explainable memory backing for AI agents that need to answer questions, recall past decisions, or link present context to repository history.
Highlights
- Purpose-built for agent memory over git history (commits, PR messages, notes).
- Small, dependency-light Rust codebase with no heavy ML runtime at inference time.
- Default-enabled temporal re-ranking to prefer recent, relevant items (opt-out available).
- CLI workflows for ingesting repositories, persisting SMGs, and querying with JSON output for programmatic agents.
Contents
- Quick start
- MCP server (markdown-first tools)
- Agent-oriented workflows & examples
- CLI reference (important flags)
- Temporal re-ranking behavior (defaults & control)
- Library API & data model
- Persistence format
- Extensibility notes (hooks for agents)
- Testing & development
- Contributing and license
Clone and build:
# Clone and build
git clone https://github.com/mrorigo/spectral-cortex.git
cd spectral-cortex
cargo build --releaseInstall from this repository (single binary with CLI + MCP subcommand):
cargo install --path crates/spectral-cortex-cli --forceOn macOS, build/install copies Torch runtime dylibs to a sibling libtorch/ directory and embeds an rpath to @executable_path/libtorch.
cargo install ...installs:~/.cargo/bin/spectral-cortex~/.cargo/bin/libtorch/*.dylib
- local
cargo buildplaces dylibs under:target/<profile>/libtorch/*.dylib
Run with:
spectral-cortex --helpIf you move/copy the binary manually on macOS, keep libtorch/ beside it (same directory level) so dylib loading continues to work.
Ingest a repository and build the SMG (recommended CLI flow):
spectral-cortex ingest --repo /path/to/repo --out smg.jsonQuery the saved SMG programmatically (JSON output suitable for agents):
spectral-cortex query --query "why did we add X" --smg smg.json --json --top-k 10A dedicated MCP subcommand is available for agent workflows that need compact, markdown-first responses instead of verbose JSON.
Run it over stdio:
spectral-cortex mcp --smg smg.jsonAvailable tools:
graph_summary: compact graph metadata for an SMG filequery_graph: semantic query with markdown tables and compact related-note summariesinspect_note: inspect one note and related notes with spectral similaritylong_range_links: list top long-range links in markdown table format
MCP client wiring example (recommended):
{
"mcpServers": {
"spectral-cortex": {
"command": "spectral-cortex",
"args": ["mcp", "--smg", "/path/to/smg.json"]
}
}
}If the binary is not on PATH, use an absolute path:
{
"mcpServers": {
"spectral-cortex": {
"command": "/absolute/path/to/spectral-cortex",
"args": ["mcp", "--smg", "/absolute/path/to/smg.json"]
}
}
}Development fallback (build+run from source each launch):
{
"mcpServers": {
"spectral-cortex": {
"command": "cargo",
"args": ["run", "-p", "spectral-cortex", "--release", "--", "mcp", "--smg", "smg.json"],
"cwd": "/Users/origo/src/spectral-cortex"
}
}
}Tool input examples:
graph_summary
{}query_graph
{
"query": "mcp protocol",
"top_k": 5,
"links_k": 3
}inspect_note
{
"note_id": 5071,
"links_k": 10
}long_range_links
{
"top_k": 20
}All MCP tool responses are markdown-first and intentionally compact to reduce token usage.
The typical flow for an agent using the SMG as memory:
- Periodic ingestion: run the ingest job (cron / CI hook) and persist
smg.json. - At runtime, load
smg.jsononce per agent process or cache it in memory. - For a user or agent query:
- Get top-K relevant turn IDs and associated note metadata via the CLI or library API (JSON).
- Retrieve the source commit ids, timestamps, and content snippets for context.
- Use the returned snippets + candidate commit ids as evidence to feed into your agent's prompt or grounding layer.
- Optionally: store agent feedback (relevance labels) externally for tuning ranking weights in future enhancements.
Why this is suited to agents
- Small and self-contained: you can run entirely on a developer machine or container.
- Deterministic local embedder available for tests; real MiniLM used by default for realistic retrieval.
- Outputs structured JSON that an agent can parse to build prompts or context windows.
- Temporal re-ranking biases results toward recent, likely more actionable history — useful for agents that should prefer recent fixes or regression-causing commits.
The spectral-cortex binary exposes: ingest, update, query, note, and mcp.
Ingest (collect commits -> SMG):
cargo run -p spectral-cortex --release -- ingest --repo /path/to/repo --out smg.jsonUpdate (incremental append ingest; only new commits are embedded):
cargo run -p spectral-cortex --release -- \
update --repo /path/to/repo --out smg.json --git-filter-preset git-noiseQuery (default, temporal enabled):
cargo run -p spectral-cortex --release -- \
query --query "refactor" --smg smg.json --json --top-k 10Inspect one note:
cargo run -p spectral-cortex --release -- \
note --smg smg.json --note-id 42 --jsonRun MCP server with preloaded SMG:
cargo run -p spectral-cortex --release -- \
mcp --smg smg.jsonmcp also accepts --smd as an alias for --smg.
Key query flags (agent-friendly):
--top-k <n>: how many final results to return (default 5).--candidate-k <n>: how many candidates to retrieve from vector search before filtering (defaults totop_k * 5).--min-score <float>: inclusive threshold applied to the combinedfinal_score(default 0.7).--no-temporal: disable temporal re-ranking for this query (temporal is enabled by default).--temporal-weight <0..1>: control recency influence (default 0.20).--temporal-half-life-days <float>: half-life for exponential decay (default 14.0).--json: emit machine-readable JSON (recommended for agents).
Key ingest/update filtering flags:
--git-filter-preset git-noise: drop common metadata lines (e.g.Co-authored-by,Signed-off-by).--git-filter-drop <regex>: repeatable custom line-drop regex.--git-filter-case-insensitive: case-insensitive regex matching.--git-commit-split-mode <off|auto|strict>: split multi-change commit messages into multiple notes.--git-commit-split-max-segments <n>: cap segments per commit.--git-commit-split-min-confidence <0..1>: confidence threshold forauto.
For local agent memory that stays fresh automatically, wire the update command into a git post-commit hook.
Example .git/hooks/post-commit:
#!/usr/bin/env bash
set -euo pipefail
spectral-cortex update \
--repo . \
--out smg.json \
--git-filter-preset git-noiseMake it executable:
chmod +x .git/hooks/post-commitTemporal re-ranking is enabled by default because agents typically benefit from fresher context when interpreting repository state. The default strategy is:
- Mode: exponential decay
- Weight: 0.20 (20% recency influence)
- Half-life: 14 days
Combination formula (final score): final = (1 - weight) * semantic_score + weight * temporal_score
Notes:
- Missing timestamps are treated as very old (temporal_score = 0).
--no-temporaldisables temporal scoring when you need canonical, time-agnostic retrieval.--min-scoreis applied tofinal_score, so agent clients can filter noisy candidates consistently.
ingest and update always rebuild spectral structures after ingesting turns.
Use the library if you embed the SMG directly inside an agent process.
Primary types:
-
SpectralMemoryGraphnew() -> Result<Self>: initializes embedder and structures.ingest_turn(&mut self, turn: &ConversationTurn) -> Result<()>: add a turn.build_spectral_structure(&mut self) -> Result<()>: compute spectral embeddings & clusters.retrieve_with_scores(&self, query: &str, candidate_k: usize) -> Result<Vec<(u64, f32)>>: returns per-turn final scores (semantic + temporal + cluster boosts). Callers may re-rank with a customTemporalConfigif you prefer different defaults.
-
ConversationTurnpub struct ConversationTurn { pub turn_id: u64, pub speaker: String, pub content: String, pub topic: String, pub entities: Vec<String>, pub commit_id: Option<String>, pub timestamp: u64, // unix epoch seconds }
-
SMGNote- Internal note stored per embedded turn; includes:
raw_content,contextembedding: Vec<f32>source_turn_ids: Vec<u64>source_commit_ids: Vec<Option<String>>source_timestamps: Vec<u64>related_note_links: Vec<(u32, f32)>
- Internal note stored per embedded turn; includes:
The JSON format is strict and versioned (metadata.format_version = "spectral-cortex-v1").
SMG persistence uses a compact JSON representation (see src/lib.rs helpers):
// Save
save_smg_json(&smg, Path::new("smg.json"))?;
// Load
let smg = load_smg_json(Path::new("smg.json"))?;The persisted structure stores notes in stable sorted order, optional cluster labels, and centroids. Spectral matrices are not persisted (they are recomputable via build_spectral_structure()).
- Retrieval diagnostics: query JSON includes per-result
score,turn_id,note_id,related_notes, and where availablecommit_idandcluster_label. Top-level JSON includestemporalsettings used for the query. - Re-ranking: you can override the default re-ranker by calling
re_rank_with_temporalwith a customTemporalConfig(weight, half-life, mode). - Incremental ingestion:
ingest_turnappends turns — you can build an ingestion pipeline that streams new commits into a long-running agent process. - Feedback loop: collect agent judgments (useful/not useful) in a separate store and use those signals to adjust
temporal_weightor to implement a learned ranker later.
- Run unit tests:
cargo test -p spectral-cortex- Use deterministic fake embedder for tests (the project auto-selects a deterministic fake embedder under
cfg(test)so CI is reproducible). - Linting & formatting:
cargo fmt
cargo clippy -- -D warnings- The embedder bundles MiniLM assets via a companion
rust_embedrepo; no network fetch is required at runtime. - Default settings assume agents should prefer recent context; tune via CLI or library
TemporalConfigfor domain needs (e.g., security audits vs. active feature work). - If you plan to serve the SMG from a shared service, snapshot
smg.jsonand load it into worker processes to avoid repeated rebuilds.
If you improve retrieval, temporal defaults, or add learning-to-rank, please:
- Fork and create a feature branch.
- Add unit tests and integration tests for retrieval ordering and temporal logic.
- Open a PR describing the change and expected agent behavior.
MIT. See LICENSE for details.