Skip to content

feat(distill): distill environment-level memories into a shared catalog#3

Merged
rockfordlhotka merged 10 commits into
mainfrom
feat/distill-environment-memories
Jun 16, 2026
Merged

feat(distill): distill environment-level memories into a shared catalog#3
rockfordlhotka merged 10 commits into
mainfrom
feat/distill-environment-memories

Conversation

@rockfordlhotka

Copy link
Copy Markdown
Member

Summary

Adds a distill capability to claude-utils: it lifts transferable lessons (shell/OS quirks, CLI gotchas, toolchain, user identity) out of a single project's Claude Code memories and makes them reusable across projects — so Claude doesn't have to re-learn the same environment lessons in every repo.

The design splits judgment from mechanics:

/distill  (skill, Claude)         claude-memsync distill (Go)     /distill-apply (skill, Claude)
  classify + generalize     ──►   rebuild DISTILLED.md index ──►    copy entries into a project's
  write catalog entries           prune stale, report worklist      memory/ + MEMORY.md
  tag originals scope:env               │
        └────────── ~/.claudesync/distilled/  (synced like everything else) ──────┘

What's included

  • internal/distill/ (mechanical, Go): BuildIndex regenerates the DISTILLED.md catalog index from per-lesson <slug>.md entry files; Reconcile (--prune) drops entries whose source memory lost the marker or vanished; Preview backs --dry-run; analyzeSources surfaces the worklist of marked-but-not-yet-distilled memories and cross-project conflicts. Includes a tolerant frontmatter parser (handles both the flat and the nested-metadata schemas) with no new dependency. Conservative by design: never prunes when the projects tree is invisible; ignores MEMORY.md and the *.tmp.* litter.
  • claude-memsync distill CLI: --prune and --dry-run; loads config with a pre-init defaults fallback.
  • Daemon hook: rebuilds the index locally after every sync (startup, debounced, and pull flushes). DISTILLED.md is a derived artifact — git-ignored and regenerated per-PC — so the generated table never causes merge conflicts, while the entry files sync via the existing git add -A.
  • Skills (skills/distill, skills/distill-apply): /distill is the classifier of record (classify → generalize → write entries → tag originals scope: environment); /distill-apply seeds chosen entries into the current project's memory.
  • init + README: init prints a one-time permission allow-rule for ~/.claudesync/distilled/ (it does not edit the user's global settings.json); README documents the flow, skill install, and permissions.

Design notes

  • Classifier of record is the skill, not a frontmatter convention. Claude Code's built-in memory-writer follows a fixed schema and won't emit a scope: marker on its own; rather than depend on that, /distill makes and records the classification, and the marker is the cached result of its judgment. The Go daemon only mechanically aggregates/indexes what the skill produced.
  • DISTILLED.md is derived, not synced. Entry files are the synced source of truth; the index is regenerated locally to avoid merge conflicts on a generated table.

Testing

  • go build ./..., go vet ./... clean.
  • internal/distill has 7 tests (frontmatter parsing both schemas, sorted index, catalog + source conflicts, pending worklist, pruning, never-prune-blind guard) — all passing.
  • End-to-end smoke-tested the claude-memsync distill binary against a temp catalog: indexes entries, detects pending, and generates a clean DISTILLED.md.

Follow-ups (not in this PR)

  • Optionally let init write the permission allow-rule automatically (behind an opt-in flag, with a careful JSON merge) instead of only printing it.
  • The *.tmp.* files in memory dirs look like leftover merge-driver temp files — worth a separate cleanup fix in the sync path.

🤖 Generated with Claude Code

rockfordlhotka and others added 10 commits June 16, 2026 10:02
Add a "distill" capability that lifts transferable lessons (shell/OS quirks,
CLI gotchas, toolchain, user identity) out of a single project's Claude Code
memories and makes them reusable across projects.

The work is split between judgment and mechanics:

- internal/distill (Go, mechanical): regenerates the DISTILLED.md catalog
  index from per-lesson <slug>.md entry files, prunes stale entries whose
  source lost the marker or vanished, and reports a worklist of marked-but-
  not-yet-distilled memories plus cross-project conflicts. Ships a tolerant
  frontmatter parser (handles both flat and nested metadata schemas) with no
  new dependency. Conservative: never prunes when the projects tree is
  invisible; ignores MEMORY.md and *.tmp.* litter.
- claude-memsync distill (CLI): --prune and --dry-run; loads config with a
  pre-init defaults fallback.
- daemon: rebuilds the index locally after every sync. DISTILLED.md is a
  derived artifact (git-ignored, regenerated per-PC) so the generated table
  never causes merge conflicts; entry files sync via the existing git add -A.
- skills/distill + skills/distill-apply: /distill is the classifier of record
  (classify, generalize, write entries, tag originals scope:environment);
  /distill-apply seeds chosen entries into the current project's memory.
- init prints a one-time permission allow-rule for ~/.claudesync/distilled/
  (non-destructive — it does not edit the user's global settings.json).

Catalog lives at ~/.claudesync/distilled/, inside the sync work-tree, so
entries propagate across workstations with no extra transport.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Step-by-step walkthrough: setup, distilling out of a project, applying into
another, keeping the catalog fresh, troubleshooting, and the mental model.
Linked from the README.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the full claude-memsync manual (prerequisites, setup, lifecycle, how it
works, deletes, on-disk layout, auth, limitations) into docs/claude-memsync.md,
matching the style of the distilling guide. Trim the README to a capabilities
overview, a quick start, and links to both docs; keep Project layout, Releasing,
and License. Repoint the distilling guide's setup cross-reference at the new
claude-memsync doc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deterministic LF in the repo on every platform (Go/shell/markdown tolerate LF
on Windows), with CRLF reserved for Windows launchers and a binary guardrail.
Stops the LF→CRLF checkout warnings on Windows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…talog dir

The skills assumed claude-memsync was on PATH and that ~/.claudesync/distilled/
already existed; when neither held, the agent probed $HOME to orient and tripped
needless permission prompts. Now both skills:

- work only with the two known paths (project memory dir + distilled catalog)
  and explicitly do not probe $HOME or the .claudesync parent
- create the catalog dir directly instead of test-and-search
- treat `claude-memsync distill` as best-effort — skip with a note if the binary
  isn't on PATH (the daemon or a later manual run rebuilds the index)
- (distill-apply) read the entry files directly as the source of truth rather
  than depending on DISTILLED.md existing

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Explain that /distill calls `claude-memsync distill` to rebuild the index and
degrades gracefully if the binary isn't found, and show the forwarding-shim
trick for running out of a dev checkout without a stale duplicate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The /distill skill normalizes catalog slugs (kebab-case), so a catalog entry's
name deliberately differs from its source memory's human-readable name. The
pending-worklist check matched on name, which mis-reported every renamed entry
as "pending". Match on provenance (originProject/originFile) instead — the same
key Reconcile already uses. Add a regression test where the source name and
catalog slug differ.

Also harden the skill against the two issues this surfaced:
- require `name` to be a kebab-case slug equal to the filename (don't carry over
  the source's sentence-style name)
- add a "cross-stack test" so narrow library/framework/version references stay
  project-scoped rather than polluting unrelated projects

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
git invokes the merge driver through its bundled sh, which treats backslashes
as escape characters. A Windows driver path like
S:\src\rdl\claude-utils\bin\claude-memmerge.exe was mangled to
"S:srcrdlclaude-utilsbinclaude-memmerge.exe: command not found", silently
disabling the MEMORY.md union merge — so every concurrent MEMORY.md edit
conflicted, breaking the daemon's rebases and stranding it on a detached HEAD.
filepath.ToSlash keeps the path intact through sh (quoted or not).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The daemon's filter only skipped names ending in ".tmp", but interrupted memory
writes leave "<name>.md.tmp.<pid>.<hash>" files that don't match that suffix —
so 40+ of them synced into the repo as litter. Ignore any name containing
".tmp." and add the pattern to the init .gitignore template. Adds a unit test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rockfordlhotka rockfordlhotka merged commit 30081a7 into main Jun 16, 2026
3 checks passed
@rockfordlhotka rockfordlhotka deleted the feat/distill-environment-memories branch June 16, 2026 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant