Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 38 additions & 46 deletions .claude/rules/mnemonic-usage.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,59 @@
# Mnemonic MCP Tool Usage — Mandatory
# Mnemonic MCP Tool Usage

## Available Tools (7)

| Tool | Purpose |
|------|---------|
| `remember` | Store decisions, errors, insights, learnings |
| `recall` | Semantic search with spread activation |
| `recall_project` | Project context + recent activity (use at session start) |
| `batch_recall` | Multiple recall queries in one round-trip |
| `feedback` | Rate recall quality (drives Hebbian learning) |
| `status` | System health check |
| `amend` | Update a stale memory in place |

## Session Start

For tasks involving code changes, decisions, or multi-step work:
1. Call `recall_project` to load project context
2. Call `recall` with keywords relevant to the user's first request
3. If either call returns useful context, use it to inform your work
4. If a call fails (FTS error, timeout), note it and move on — don't block the session
2. Call `recall` with keywords relevant to the user's request
3. If useful context found, use it. If not, move on.

Alternative: Use `batch_recall` to combine multiple queries into one round-trip.
Alternative: `batch_recall` to combine project context + task-specific queries.

For trivial tasks (typo fix, single-line change, quick question): skip recall and just do the work.
For trivial tasks: skip recall, just do the work.

## During Work (MUST)
## During Work

### Remember
### Remember (be selective)

- **Decisions**: Architectural/design choices — `type: "decision"`
- **Errors**: Bugs encountered and resolved — `type: "error"`
- **Insights**: Non-obvious discoveries about the codebase — `type: "insight"`
- **Learnings**: Library, API, or framework behavior — `type: "learning"`
- **Experiment results**: HP sweep findings, benchmark baselines, training outcomes — `type: "insight"` or `type: "decision"` depending on whether it's an observation or a choice made from it
Only store things a future session would need:
- **Decisions**: "chose X because Y" — `type: "decision"`
- **Errors**: bugs found and how they were fixed — `type: "error"`
- **Insights**: non-obvious discoveries — `type: "insight"`
- **Learnings**: API/framework behavior — `type: "learning"`

Use judgment — remember things a future session would need. Don't remember trivial actions, file paths, or things derivable from git history.
Do NOT remember: file paths, trivial changes, things derivable from git history or code.

### Recall mid-session

Don't only recall at session start. When entering new territory (new subsystem, unfamiliar pattern, making claims about prior work), call `recall` with specific keywords first. Example: before suggesting HP ranges, recall prior training findings. Before claiming something works a certain way, check if there's a stored decision or learning about it.
When entering unfamiliar territory, recall before assuming. Check if there's a prior decision or known issue.

### Amend stale memories

If a recall returns a memory that's outdated or partially wrong, use `amend` to update it in place rather than creating a new memory. This preserves associations and history.

## After Recalls (MUST)

- After using `recall` and acting on the results, call `feedback`:
- `helpful` — memories were relevant and informed your work
- `partial` — some relevant, some noise
- `irrelevant` — memories didn't help
- If recall returned 0 results, no feedback needed — but consider whether your query was too broad or too specific
- This trains the retrieval system — skipping it degrades future recall quality

## Between Phases / Major Tasks (MUST)

When working through multi-phase plans (epics, milestones, sequential issues):
- `remember` key decisions, strategy changes, or gotchas from the completed phase before starting the next
- `recall` relevant context before entering a new phase — prior phase decisions may affect the current one
- This ensures continuity across long sessions and prevents rediscovering the same issues

## Reducing Noise
If recall returns outdated info, use `amend` to fix it in place. This preserves associations.

- Use `include_patterns: false` and `include_abstractions: false` on `recall` when you only need memories, not patterns/principles
- Use `types: ["decision", "error"]` to filter recall to actionable memory types
- Use `dismiss_pattern` and `dismiss_abstraction` to archive noise that keeps surfacing
## After Recalls

## Before Committing (SHOULD)
Call `feedback` after acting on recall results:
- `helpful` — memories informed your work
- `partial` — some useful, some noise
- `irrelevant` — didn't help

- Review the session's work and `remember` any decisions or insights that haven't been stored yet
- Call `session_summary` if the session involved significant work
This trains retrieval. Skipping it degrades future quality.

## General
## What NOT to Do

- Prefer specific `recall` queries over broad ones — "SQLite FTS5 migration" not "database stuff"
- Set the `type` field on every `remember` call — never use the default "general" when a specific type fits
- When a recall returns irrelevant noise, say so via `feedback` — this is how the system improves
- Don't remember things that belong in experiment docstraining results go in `training/docs/`, not just in mnemonic memory. Memory is for cross-session context, not a substitute for proper documentation
- Don't use `include_patterns` or `include_abstractions` — these produce noise
- Don't store experiment results in memory — those go in `training/docs/`
- Don't remember things that belong in code comments or commit messages
- Don't create memories about file structure or architectureread the code instead
189 changes: 67 additions & 122 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Mnemonic — Development Guide

Mnemonic is a local-first, air-gapped semantic memory system built in Go. It uses 8 cognitive agents + orchestrator + reactor, SQLite with FTS5 + vector search, and LLMs (LM Studio locally or cloud APIs like Gemini) for semantic understanding.
Mnemonic is a local-first, air-gapped semantic memory daemon for AI agents. Built in Go, it provides persistent long-term memory via SQLite with FTS5 + vector search, heuristic encoding, and spread activation retrieval. No LLM required.

## Build & Test

Expand All @@ -15,159 +15,104 @@ golangci-lint run # Lint (uses .golangci.yml config)

**Version** is injected via ldflags from `Makefile` (managed by release-please). The binary var is in `cmd/mnemonic/main.go`.

## Architecture

### Embedding Pipeline (no LLM)

All encoding uses heuristic Go code — no generative LLM calls anywhere:

```
MCP remember → raw memory → heuristic encoding (RAKE concepts + salience) → hugot embedding (384-dim MiniLM) → SQLite + FTS5
MCP recall → FTS5 + embedding search → spread activation → rank → return
```

Three embedding providers available via `config.yaml`:
- `bow` — 128-dim bag-of-words (instant, zero dependencies)
- `hugot` — 384-dim MiniLM-L6-v2 via pure Go (no CGo, no shared library)
- `api` — OpenAI-compatible endpoint (for cloud embeddings)

### Cognitive Agents

Agents communicate via event bus, never direct calls. Their value is in **side effects** (association strengthening, salience decay, clustering), not text output:

- **Encoding** — Raw events → memories with concepts + embeddings
- **Retrieval** — FTS5 + vector search + spread activation
- **Consolidation** — Decay salience, merge related memories, prune dead associations
- **Dreaming** — Replay memories, strengthen associations, cross-pollinate
- **Orchestrator** — Schedule agent cycles, health monitoring

Perception watchers (filesystem, git, terminal, clipboard) are **disabled by default** — agents have direct codebase access and watcher-sourced memories create retrieval noise.

## Project Layout

```
cmd/mnemonic/ CLI + daemon entry point
cmd/benchmark/ End-to-end benchmark
cmd/benchmark-quality/ Memory quality IR benchmark
cmd/lifecycle-test/ Full lifecycle simulation (install → 3 months)
cmd/lifecycle-test/ Full lifecycle simulation
internal/
agent/ 8 cognitive agents + orchestrator + reactor + forum
perception/ Watch filesystem/terminal/clipboard, heuristic filter
encoding/ LLM compression, concept extraction, association linking
episoding/ Temporal episode clustering
consolidation/ Decay, merge, prune (sleep cycle)
retrieval/ Spread activation + LLM synthesis with tool-use
metacognition/ Self-reflection, feedback processing, audit
dreaming/ Memory replay, cross-pollination, insight generation
abstraction/ Patterns → principles → axioms
orchestrator/ Autonomous scheduler, health monitoring
reactor/ Event-driven rule engine
forum/ Agent personality system for forum communication
agent/ Cognitive agents + orchestrator + reactor
api/ REST API server + routes
web/ Embedded dashboard (forum-style, modular ES modules + CSS)
mcp/ MCP server (24 tools for Claude Code)
web/ Embedded dashboard
mcp/ MCP server (7 core tools)
embedding/ Embedding providers (bow, hugot, api) + RAKE + TurboQuant
store/ Store interface + SQLite implementation
llm/ LLM provider interface + implementations (LM Studio, Gemini/cloud API)
llamacpp/ Optional embedded llama.cpp backend (CGo, build-tagged)
ingest/ Project ingestion engine
watcher/ Filesystem (FSEvents/fsnotify), terminal, clipboard
daemon/ Service management (macOS launchd, Linux systemd, Windows Services)
updater/ Self-update via GitHub Releases
llm/ Legacy LLM provider interface (kept for MCP server compat)
watcher/ Filesystem, terminal, clipboard watchers (disabled by default)
daemon/ Service management (launchd, systemd, Windows Services)
events/ Event bus (in-memory pub/sub)
config/ Config loading (config.yaml)
logger/ Structured logging (slog)
concepts/ Shared concept extraction (paths, commands, event types)
backup/ Export/import
testutil/ Shared test infrastructure (stub LLM provider)
sdk/ Python agent SDK (self-evolving assistant)
agent/evolution/ Agent evolution data (created at runtime, gitignored)
agent/evolution/examples/ Example evolution data for reference
training/ Mnemonic-LM training infrastructure
scripts/ Training, sweep, bisection, data download scripts
configs/ Data mix config (pretrain_mix.yaml)
docs/ Experiment registry, analysis docs
data/ Tokenized pretraining shards (gitignored)
sweep_results.tsv HP sweep results log
probe_results.tsv Short probe results from LR bisection
third_party/ llama.cpp submodule (for embedded LLM builds)
sdk/ Python agent SDK
training/ Training infrastructure (historical, not active)
migrations/ SQLite schema migrations
scripts/ Utility scripts
```

## Conventions

- **Event bus architecture:** Agents communicate via events, never direct calls. To add behavior, subscribe to events in the bus.
- **Store interface:** All data access goes through `store.Store` interface. The SQLite implementation is in `internal/store/sqlite/`.
- **Event bus architecture:** Agents communicate via events, never direct calls.
- **Store interface:** All data access goes through `store.Store` interface.
- **Error handling:** Wrap errors with context: `fmt.Errorf("encoding memory %s: %w", id, err)`
- **Platform-specific code:** Use Go build tags (`//go:build darwin`, `//go:build !darwin`). See `internal/watcher/filesystem/` for examples.
- **Config:** All tunables live in `config.yaml`. Add new fields to `internal/config/config.go` struct.

## Adding Things

- **New agent:** Implement `agent.Agent` interface, register in `cmd/mnemonic/main.go` serve pipeline.
- **New CLI command:** Add case to the command switch in `cmd/mnemonic/main.go`.
- **New API route:** Add handler in `internal/api/routes/`, register in `internal/api/server.go`. Existing routes include `/api/v1/activity` (watcher concept tracker for MCP sync).
- **New MCP tool:** Add to `internal/mcp/server.go` tool registration.
- **Platform-specific code:** Use Go build tags (`//go:build darwin`, `//go:build !darwin`).
- **Config:** All tunables live in `config.yaml`. Add new fields to `internal/config/config.go`.

## Platform Support

| Platform | Status |
|----------|--------|
| macOS ARM | Full support (primary dev platform) |
| Linux x86_64 | Supported — `serve`, `install`, `start`, `stop`, `uninstall` all work via systemd |
| Windows x86_64 | Supported — `serve`, `install`, `start`, `stop`, `uninstall` work via Windows Services |

## Training (Mnemonic-LM)

Training scripts live in `training/scripts/` and require the **Felix-LM venv**:

```bash
source ~/Projects/felixlm/.venv/bin/activate
```

Key scripts:

- `train_mnemonic_lm.py` — Main training script (imports Felix-LM v3 from `~/Projects/felixlm`)
- `run_sweep.sh` — Run HP sweep configs sequentially with auto-logging to TSV
- `bisect_lr.sh` — Binary search for optimal LR using short probes + full confirmation
- `validate.py` — Quality gate pipeline for fine-tuning data

All experiments must be pre-registered in `training/docs/experiment_registry.md` before running. See `.claude/rules/scientific-method.md` and `.claude/rules/experiment-logging.md`.

## Known Issues

See [GitHub Issues](https://github.com/appsprout-dev/mnemonic/issues) for tracked bugs.

---

## MCP Tools Available

You have 24 tools via the `mnemonic` MCP server:

| Tool | When to Use |
|------|-------------|
| `remember` | Store decisions, errors, insights, learnings (returns raw ID + salience) |
| `recall` | Semantic search with spread activation (`explain`, `include_associations`, `format`, `type`, `types`, `include_patterns`, `include_abstractions`, `synthesize` params) |
| `batch_recall` | Run multiple recall queries in parallel — ideal for session start |
| `get_context` | Proactive suggestions based on recent daemon activity — call at natural breakpoints |
| `forget` | Archive irrelevant memories |
| `amend` | Update a memory's content in place (preserves associations, history, salience) |
| `check_memory` | Inspect a memory's encoding status, concepts, and associations |
| `status` | System health, encoding pipeline status, source distribution |
| `recall_project` | Get project-specific context and patterns |
| `recall_timeline` | See what happened in a time range |
| `recall_session` | Retrieve all memories from a specific MCP session |
| `list_sessions` | List recent sessions with time range and memory count |
| `session_summary` | Summarize current/recent session |
| `get_patterns` | View discovered recurring patterns (returns IDs for dismissal, supports `min_strength`) |
| `get_insights` | View metacognition observations and abstractions (returns IDs for dismissal) |
| `feedback` | Report recall quality (drives ranking, can auto-suppress noisy memories) |
| `audit_encodings` | Review recent encoding quality and suggest improvements |
| `coach_local_llm` | Write coaching guidance to improve local LLM prompts |
| `ingest_project` | Bulk-ingest a project directory into memory |
| `exclude_path` | Add a watcher exclusion pattern at runtime |
| `list_exclusions` | List all runtime watcher exclusion patterns |
| `dismiss_pattern` | Archive a stale or irrelevant pattern to stop it surfacing in recall |
| `dismiss_abstraction` | Archive a stale or irrelevant principle/axiom to stop it surfacing in recall |
| `create_handoff` | Store structured session handoff notes (high salience, surfaced by recall_project) |
| macOS ARM | Full support |
| Linux x86_64 | Full support (systemd) |
| Windows x86_64 | Full support (Windows Services) |

## MCP Tools (7)

| Tool | Purpose |
|------|---------|
| `remember` | Store decisions, errors, insights, learnings |
| `recall` | Semantic search with spread activation |
| `recall_project` | Project context + recent activity |
| `batch_recall` | Multiple recall queries in parallel |
| `feedback` | Rate recall quality (drives Hebbian learning) |
| `status` | System health |
| `amend` | Update a memory in place |

### At Session Start

- Use `recall_project` to load context for the current project
- Use `recall` with relevant keywords to find prior decisions
- `recall_project` — project context
- `recall` or `batch_recall` — task-specific context

### During Work

- `remember` decisions with `type: "decision"` — e.g., "chose SQLite over Postgres for simplicity"
- `remember` errors with `type: "error"` — e.g., "nil pointer in auth middleware, fixed with guard clause"
- `remember` insights with `type: "insight"` — e.g., "spread activation works best with 3 hops max"
- `remember` learnings with `type: "learning"` — e.g., "Go's sql.NullString needed for nullable columns"
- `remember` decisions, errors, insights, learnings
- `recall` before entering unfamiliar territory
- `amend` stale memories instead of creating new ones

### After Recalls

- Use `feedback` to rate recall quality — this helps the system improve
- `helpful` = memories were relevant and useful
- `partial` = some relevant, some not
- `irrelevant` = memories didn't help
- `feedback` — rate quality (helpful/partial/irrelevant)

### Memory Types
## Known Issues

When using `remember`, set the `type` field:
See [GitHub Issues](https://github.com/appsprout-dev/mnemonic/issues) for tracked bugs.

- `decision` — architectural choices, tradeoffs, "we chose X because Y"
- `error` — bugs found, error patterns, debugging insights
- `insight` — realizations about code, architecture, or process
- `learning` — new knowledge, API behaviors, framework quirks
- `general` — everything else (default)
**Active branch:** `feat/heuristic-pipeline` (PR #374) — major refactor removing all LLM dependency.
Loading