diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..4cd54ec --- /dev/null +++ b/.gitattributes @@ -0,0 +1,17 @@ +# Keep the repository normalized to LF on every platform. Go, shell, and +# markdown all tolerate LF on Windows, and a synced-across-workstations project +# benefits from deterministic line endings regardless of each PC's autocrlf. +* text=auto eol=lf + +# Windows launchers must be CRLF if ever committed (the daemon generates the +# .vbs launcher at runtime, so none are tracked today — this is a guardrail). +*.bat text eol=crlf +*.cmd text eol=crlf +*.vbs text eol=crlf +*.ps1 text eol=crlf + +# Binary assets: never normalize. +*.png binary +*.ico binary +*.gz binary +*.zip binary diff --git a/README.md b/README.md index 44de130..7e07879 100644 --- a/README.md +++ b/README.md @@ -4,252 +4,41 @@ Cross-workstation tooling for Claude Code. -## claude-memsync +Two capabilities, both built from a single pair of Go binaries +(`claude-memsync` + `claude-memmerge`) that run on Windows, Linux, and macOS: -Background daemon that keeps `~/.claude/projects//memory/` -in sync across multiple workstations using a private git repository as -transport. A custom git merge driver (`claude-memmerge`) unions -`MEMORY.md` section blocks instead of producing line-level conflicts. -Single Go binary; runs on Windows, Linux, and macOS. +- **[claude-memsync](docs/claude-memsync.md)** — a background daemon that keeps + your Claude Code memories (`~/.claude/projects//memory/`) in sync across + multiple workstations, using a private git repository as transport. A custom + merge driver unions `MEMORY.md` section blocks instead of producing line-level + conflicts. +- **[Distilling environment memories](docs/distilling-memories.md)** — lift the + *transferable* lessons (shell/OS quirks, CLI gotchas, toolchain, your standing + preferences) out of one project and reuse them in any other, existing or new, + via a shared catalog and two Claude skills (`/distill`, `/distill-apply`). -## Prerequisites - -- Go 1.23+ to build -- `git` 2.x on PATH at runtime (the daemon shells out) -- A private git remote you can `git push` to from your terminal — the - daemon inherits your normal git credentials (Git Credential Manager on - Windows, SSH agent or `~/.gitconfig` credential helpers on Linux/macOS). - Confirm `git push` works against the remote before installing. -- The same project paths on each PC (we use Claude's project-hash - directory names, which are derived from absolute paths). If your repos - live at the same drive letter / mount point on every PC, you're set. - -## Setup - -### 1. Create the private remote (once, from any PC) - -Any empty private git repo works. With the GitHub CLI: - -```sh -gh repo create /claude-sync --private --description "Private Claude Code memory sync" -``` - -### 2. Build (each PC) +## Quick start ```sh +# Build both binaries (they must end up in the same directory) git clone https://github.com/MarimerLLC/claude-utils.git cd claude-utils go build -o bin/ ./cmd/... -``` -For a release build that stamps the version into the binary: - -```sh -VERSION=$(git describe --tags --always --dirty) -go build -ldflags "-X github.com/MarimerLLC/claude-utils/internal/version.Override=$VERSION" -o bin/ ./cmd/... -``` - -A plain `go build` (no ldflags) still produces a working binary — -`claude-memsync version` falls back to the VCS revision Go embeds -automatically (e.g. `dev+a52d68609b6e`). - -Both binaries (`claude-memsync` and `claude-memmerge`) end up in `bin/`. -**They must live in the same directory** — `claude-memsync` finds the -merge driver as a sibling. Move both to a stable location if you don't -want them tied to the source checkout, e.g.: - -- Windows: `C:\Program Files\claude-memsync\` (or anywhere on PATH) -- Linux/macOS: `~/.local/bin/` - -### 3. Initialize the local sync repo (each PC) - -```sh +# Bootstrap and run the sync daemon (per PC) claude-memsync init --remote https://github.com//claude-sync.git -``` - -This: - -- Clones the remote into `~/.claudesync/` -- Configures the `MEMORY.md` merge driver in the local git config -- Mirrors any existing `~/.claude/projects//memory/` content into - the work-tree -- Writes `~/.claudesync/config.json` (per-PC; never synced) -- Writes `~/.claudesync/.state/manifest.json` (per-PC; never synced) -- Pushes the seed commit if this is the first PC - -On a second or third PC where the remote already has content from -another workstation, init handles collisions: - -- `MEMORY.md` present on both sides → semantic merge (sections from - both PCs are unioned) -- Other memory files differing on both sides → mirror copy is preserved - as `.from-remote-` for manual review; the local version - is taken - -### 4. Install and start the daemon (each PC) - -```sh claude-memsync install claude-memsync start -claude-memsync status ``` -`install` registers the daemon to auto-start when you log in: - -- **Windows**: drops `claude-memsync.vbs` in - `%APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup\`. The script - launches the daemon hidden and detached via `WScript.Shell.Run`, so no - console window is left behind at logon. No admin required. (Task - Scheduler was investigated; it requires admin even for per-user logon - tasks, so the Startup folder is the simpler path.) -- **Linux**: writes a systemd user unit at - `~/.config/systemd/user/claude-memsync.service` and enables it. -- **macOS**: writes a launchd plist at - `~/Library/LaunchAgents/claude-memsync.plist`. +Full instructions: -All three run as your logged-in user, with full access to your -credential vault and SSH keys. None require root. - -## Lifecycle commands - -| Command | What it does | -|---|---| -| `claude-memsync init --remote ` | One-time bootstrap. | -| `claude-memsync install` | Register for auto-start at next logon. | -| `claude-memsync uninstall` | Remove the auto-start hook. | -| `claude-memsync start` | Start the daemon now. | -| `claude-memsync stop` | Stop the running daemon. | -| `claude-memsync status` | Print `running (pid N)` or `stopped`. | -| `claude-memsync run` | Run in the foreground (for debugging). | -| `claude-memsync version` | Print version. | - -## How it works - -``` -~/.claude/projects//memory/ (Claude reads + writes here — authoritative) - │ - │ fsnotify watcher, 3s debounce - ▼ -~/.claudesync/projects//memory/ (mirror — git work-tree) - │ - │ git add -A, commit, pull --rebase, push - ▼ - (private GitHub repo) - │ - │ ls-remote check on 1h timer (or after local push); - │ full pull only when remote SHA actually moved - ▼ - propagate pull-driven changes back to ~/.claude/projects/... -``` - -Idle ticks cost a single `git ls-remote` round-trip — no pull, no push, -no merge driver invocation. The full pull/push cycle runs only when -origin's branch SHA differs from the local `refs/remotes/origin/` -or when there are unpushed commits. - -- The daemon never edits files in `~/.claude/projects/...` while a write - is in progress — it copies into the mirror first, then any inbound - changes from a `git pull` are written back atomically. -- The custom merge driver is registered in the local repo config and - invoked by git via `.gitattributes` (`MEMORY.md merge=claude-memory-index`). -- Conflicts in non-`MEMORY.md` files surface as standard git conflict - markers in the affected file. Rare in practice because each memory - file has a unique filename per topic. - -## What gets synced (and what doesn't) - -**Synced:** every file directly under -`~/.claude/projects//memory/` on any PC. - -**Not synced:** - -- `~/.claude/CLAUDE.md` (your global instructions) -- `~/.claude/agents/`, `~/.claude/commands/`, `~/.claude/skills/` -- `~/.claude/sessions/`, `~/.claude/todos/`, `~/.claude/cache/`, - `~/.claude/history.jsonl`, `~/.claude/settings.json`, etc. -- `~/.claudesync/config.json` — per-PC (paths embed your machine layout) -- `~/.claudesync/.state/manifest.json` — per-PC (delete-detection state) -- `~/.claudesync/daemon.pid` — runtime state - -If you want any of the additional `~/.claude/` content synced, that's a -future enhancement. - -## Deletes - -Deletes propagate across PCs. The daemon keeps a per-PC manifest -(`~/.claudesync/.state/manifest.json`) listing which Claude-side memory -files were present at the last successful sync. On each reconcile pass, -a file that's in the mirror, missing from Claude, **and** was in the -manifest is treated as a user delete and removed from the mirror; the -next push propagates it. A file that's in the mirror, missing from -Claude, but **not** in the manifest is treated as an inbound new file -from another PC and copied into Claude. - -This means: - -- Delete a memory while the daemon is running → propagates immediately - (watcher catches it). -- Delete a memory while the daemon is stopped → propagates on next - startup (manifest-driven reconcile). -- First-ever run on a PC has no manifest, so deletes can't be inferred - from prior state. The daemon takes the safe path: never infer deletes, - bring everything together. Subsequent runs work normally. - -## On-disk layout - -``` -~/.claude/projects//memory/ # what Claude reads + writes - ├── MEMORY.md - └── .md ... - -~/.claudesync/ # owned by the daemon - ├── config.json # per-PC (gitignored) - ├── daemon.pid # per-PC (gitignored) - ├── .git/ # the sync repo - ├── .gitattributes # MEMORY.md merge=claude-memory-index - ├── .gitignore # excludes config.json, .state/, etc. - ├── .state/manifest.json # per-PC delete-detection state - └── projects//memory/... # git work-tree mirror -``` - -## Auth notes - -The daemon shells out to the system `git` binary, so it uses whatever -auth is configured in your environment: - -- **HTTPS with Git Credential Manager** (`gh auth login` on Windows - populates this): zero extra setup. -- **SSH**: ensure `ssh-agent` is running for your user session and your - key is loaded. On Linux, `systemctl --enable-linger ` keeps the - user instance running across logoffs if you want sync activity while - not logged in. -- **PAT in URL** (`https://@github.com/...`): works but the token - ends up in `~/.claudesync/.git/config`. Not recommended. - -Test before installing the daemon: -```sh -git -C ~/.claudesync push -``` -If that works without prompting, the daemon will too. - -## Limitations / known issues - -- **Path consistency required**: Claude derives the per-project memory - directory name by escaping the project's absolute path. PCs that - open the same repo at different paths (e.g. `C:\src\foo` vs. - `D:\dev\foo`) will see them as different projects. -- **No live conflict UI**: when the merge driver emits actual conflict - markers (rare; only on overlapping line edits within the same - `MEMORY.md` section body), the file is committed and pushed with - markers. The next time you edit on that PC you'll see them; resolve - by hand and save. -- **Auto-start runs while logged in**: the daemon stops when the user - logs off. Acceptable since memories don't change while you're away. - For 24/7 sync, register a system-wide service manually or enable - systemd lingering. -- **Stop is a hard kill**: by design — git operations are atomic per - command, so we don't risk corruption. If a `.git/index.lock` is left - behind by an unrelated git crash, remove it manually. +- **Syncing memories across machines** → [docs/claude-memsync.md](docs/claude-memsync.md) + (prerequisites, setup, lifecycle commands, how it works, deletes, auth, + limitations). +- **Distilling lessons across projects** → [docs/distilling-memories.md](docs/distilling-memories.md) + (installing the skills, the `/distill` and `/distill-apply` workflows, + permissions, troubleshooting). ## Project layout @@ -259,10 +48,12 @@ If that works without prompting, the daemon will too. | `cmd/claude-memmerge` | Git custom merge driver for `MEMORY.md` | | `internal/sync` | Mirror, reconcile, watcher loop, manifest | | `internal/merge` | Section-block parser + 3-way semantic merge | +| `internal/distill` | Distilled-memory catalog index + reconcile | | `internal/gitwt` | Wrapper around the `git` CLI scoped to the sync work-tree | | `internal/config` | Config load/save (`~/.claudesync/config.json`) | | `internal/proc` | Cross-platform helper that hides child-process console windows on Windows | | `internal/version` | Version resolution from `-ldflags` override or embedded VCS info | +| `skills/` | `/distill` and `/distill-apply` Claude Code skills | ## Releasing diff --git a/cmd/claude-memsync/distill.go b/cmd/claude-memsync/distill.go new file mode 100644 index 0000000..30b4eca --- /dev/null +++ b/cmd/claude-memsync/distill.go @@ -0,0 +1,96 @@ +package main + +import ( + "errors" + "flag" + "fmt" + "io/fs" + "os" + + "github.com/MarimerLLC/claude-utils/internal/config" + "github.com/MarimerLLC/claude-utils/internal/distill" +) + +// runDistill implements `claude-memsync distill`. It regenerates the +// DISTILLED.md index from the catalog entry files the /distill skill produced, +// optionally prunes stale entries, and reports the worklist of marked-but-not- +// yet-distilled memories. It performs no classification — that is the skill's +// job; this is the mechanical half. +func runDistill(args []string) int { + flags := flag.NewFlagSet("distill", flag.ContinueOnError) + cfgPath := flags.String("config", "", "path to config.json (default: ~/.claudesync/config.json)") + prune := flags.Bool("prune", false, "remove catalog entries whose source memory lost the marker or vanished") + dryRun := flags.Bool("dry-run", false, "report what would change without writing the index or pruning") + if err := flags.Parse(args); err != nil { + return 2 + } + + cfg, err := loadDistillConfig(*cfgPath) + if err != nil { + fmt.Fprintln(os.Stderr, "distill:", err) + return 1 + } + opts := distill.Options{ + ProjectsDir: cfg.ClaudeProjectsDir, + DistilledDir: cfg.DistilledPath(), + } + + var res distill.Result + if *dryRun { + res, err = distill.Preview(opts) + } else { + res, err = distill.Run(opts, *prune) + } + if err != nil { + fmt.Fprintln(os.Stderr, "distill:", err) + return 1 + } + + report(res, *dryRun) + return 0 +} + +// loadDistillConfig loads config.json, falling back to platform defaults when +// no config exists yet (the catalog can be indexed before `init` has run). +func loadDistillConfig(cfgPath string) (config.Config, error) { + if cfgPath == "" { + cfgPath = defaultConfigPath() + } + cfg, err := config.Load(cfgPath) + if errors.Is(err, fs.ErrNotExist) { + return config.Defaults(), nil + } + return cfg, err +} + +func report(res distill.Result, dryRun bool) { + verb := "indexed" + if dryRun { + verb = "would index" + } + fmt.Printf("%s %d distilled %s\n", verb, res.Indexed, plural(res.Indexed, "entry", "entries")) + if res.Pruned > 0 { + fmt.Printf("pruned %d stale %s\n", res.Pruned, plural(res.Pruned, "entry", "entries")) + } + if len(res.Pending) > 0 { + fmt.Printf("\n%d marked %s awaiting distillation (run /distill to generalize):\n", + len(res.Pending), plural(len(res.Pending), "memory", "memories")) + for _, o := range res.Pending { + fmt.Printf(" - %s (%s/%s)\n", o.Name, o.Project, o.File) + } + } + if len(res.Conflicts) > 0 { + fmt.Printf("\n%d %s — same name, divergent content (resolve in /distill):\n", + len(res.Conflicts), plural(len(res.Conflicts), "conflict", "conflicts")) + for _, c := range res.Conflicts { + fmt.Printf(" - %s: %v\n", c.Name, c.Sources) + } + } +} + +func plural(n int, one, many string) string { + if n == 1 { + return one + } + return many +} diff --git a/cmd/claude-memsync/init.go b/cmd/claude-memsync/init.go index 5c29df5..a761488 100644 --- a/cmd/claude-memsync/init.go +++ b/cmd/claude-memsync/init.go @@ -102,7 +102,13 @@ func bootstrap(cfg config.Config, force bool) error { if err := repo.ConfigSet(fmt.Sprintf("merge.%s.name", mergeDriverName), "claude memory index merge"); err != nil { return err } - driverCmd := fmt.Sprintf("%s %%O %%A %%B %%L %%P", quoteIfSpaces(cfg.MergeDriverPath)) + // Use forward slashes in the driver path: git invokes the driver command + // through its bundled sh, which treats backslashes as escapes — a Windows + // path like S:\src\...\claude-memmerge.exe would be mangled to + // S:srcclaude-memmerge.exe ("command not found"), silently disabling the + // MEMORY.md union merge. Forward slashes work for Windows exe invocation and + // survive sh unquoted or quoted. + driverCmd := fmt.Sprintf("%s %%O %%A %%B %%L %%P", quoteIfSpaces(filepath.ToSlash(cfg.MergeDriverPath))) if err := repo.ConfigSet(fmt.Sprintf("merge.%s.driver", mergeDriverName), driverCmd); err != nil { return err } @@ -164,11 +170,27 @@ func bootstrap(cfg config.Config, force bool) error { fmt.Fprintln(os.Stderr, "init OK") fmt.Fprintln(os.Stderr, " sync dir: ", cfg.SyncDir) fmt.Fprintln(os.Stderr, " claude projects: ", cfg.ClaudeProjectsDir) + fmt.Fprintln(os.Stderr, " distilled: ", cfg.DistilledPath()) fmt.Fprintln(os.Stderr, " remote: ", cfg.RemoteURL) fmt.Fprintln(os.Stderr, " merge driver: ", cfg.MergeDriverPath) + printDistillPermissionHint(cfg) return nil } +// printDistillPermissionHint suggests the one-time Claude Code permission rule +// that lets the /distill and /distill-apply skills read and write the shared +// catalog without prompting under default (non-bypass) permissions. We only +// advise — editing the user's global settings.json is left to them (or the +// update-config skill), since it is theirs to own. +func printDistillPermissionHint(cfg config.Config) { + dir := filepath.ToSlash(cfg.DistilledPath()) + fmt.Fprintln(os.Stderr) + fmt.Fprintln(os.Stderr, "To let the /distill skills use the catalog without permission prompts,") + fmt.Fprintln(os.Stderr, "add this to ~/.claude/settings.json (permissions.allow):") + fmt.Fprintf(os.Stderr, " \"Read(%s/**)\",\n", dir) + fmt.Fprintf(os.Stderr, " \"Write(%s/**)\"\n", dir) +} + // tryClone attempts `git clone `. Returns (true, nil) on success, // (false, err) on failure. An empty-repo error is one common failure. func tryClone(repo *gitwt.Repo, url string) (bool, error) { @@ -196,7 +218,11 @@ config.json daemon.pid .state/ *.tmp +*.tmp.* *.from-remote-* +# Distilled catalog index is a derived artifact; each PC regenerates it locally +# from the synced entry files (avoids merge conflicts on the generated table). +distilled/DISTILLED.md ` return os.WriteFile(path, []byte(content), 0600) } diff --git a/cmd/claude-memsync/main.go b/cmd/claude-memsync/main.go index 87b92ec..e170f7e 100644 --- a/cmd/claude-memsync/main.go +++ b/cmd/claude-memsync/main.go @@ -22,6 +22,8 @@ func main() { os.Exit(runInit(os.Args[2:])) case "run": os.Exit(runRun(os.Args[2:])) + case "distill": + os.Exit(runDistill(os.Args[2:])) case "install": os.Exit(runInstall(os.Args[2:])) case "uninstall": @@ -48,6 +50,7 @@ Usage: Subcommands: init Bootstrap the local sync repo against a remote run Run the sync daemon in the foreground + distill Rebuild the distilled-memory catalog index (see --prune, --dry-run) install Install as a system service (Windows Service / systemd unit / launchd plist) uninstall Remove the system service start Start the system service diff --git a/docs/claude-memsync.md b/docs/claude-memsync.md new file mode 100644 index 0000000..22b5523 --- /dev/null +++ b/docs/claude-memsync.md @@ -0,0 +1,262 @@ +# claude-memsync — syncing Claude Code memories across workstations + +A guide to setting up and running the memory-sync daemon. + +## What this is for + +Claude Code stores per-project memories under +`~/.claude/projects//memory/`. If you work across more than one +machine, those memories live only on whichever PC learned them. `claude-memsync` +is a small background daemon that keeps that memory tree in sync across all your +workstations, using a **private git repository as transport**. A custom git +merge driver (`claude-memmerge`) unions `MEMORY.md` section blocks instead of +producing line-level conflicts. + +It's a single Go binary, runs on Windows, Linux, and macOS, and needs no server +of its own — just a git remote you can push to. + +> Want to reuse environment-level lessons across *different* projects (not just +> sync the same project across machines)? That's a separate capability — see +> [distilling-memories.md](distilling-memories.md). + +## Prerequisites + +- Go 1.23+ to build. +- `git` 2.x on PATH at runtime (the daemon shells out to it). +- A private git remote you can `git push` to from your terminal. The daemon + inherits your normal git credentials (Git Credential Manager on Windows, SSH + agent or `~/.gitconfig` credential helpers on Linux/macOS). Confirm `git push` + works against the remote before installing. +- The same project paths on each PC. Claude's project-hash directory names are + derived from absolute paths, so if your repos live at the same drive letter / + mount point on every PC, you're set. (See [Limitations](#limitations--known-issues).) + +## Setup + +### 1. Create the private remote (once, from any PC) + +Any empty private git repo works. With the GitHub CLI: + +```sh +gh repo create /claude-sync --private --description "Private Claude Code memory sync" +``` + +### 2. Build (each PC) + +```sh +git clone https://github.com/MarimerLLC/claude-utils.git +cd claude-utils +go build -o bin/ ./cmd/... +``` + +For a release build that stamps the version into the binary: + +```sh +VERSION=$(git describe --tags --always --dirty) +go build -ldflags "-X github.com/MarimerLLC/claude-utils/internal/version.Override=$VERSION" -o bin/ ./cmd/... +``` + +A plain `go build` (no ldflags) still produces a working binary — +`claude-memsync version` falls back to the VCS revision Go embeds automatically +(e.g. `dev+a52d68609b6e`). See the README's *Releasing* section for details. + +Both binaries (`claude-memsync` and `claude-memmerge`) end up in `bin/`. +**They must live in the same directory** — `claude-memsync` finds the merge +driver as a sibling. Move both to a stable location if you don't want them tied +to the source checkout, e.g.: + +- Windows: `C:\Program Files\claude-memsync\` (or anywhere on PATH) +- Linux/macOS: `~/.local/bin/` + +### 3. Initialize the local sync repo (each PC) + +```sh +claude-memsync init --remote https://github.com//claude-sync.git +``` + +This: + +- Clones the remote into `~/.claudesync/` +- Configures the `MEMORY.md` merge driver in the local git config +- Mirrors any existing `~/.claude/projects//memory/` content into the + work-tree +- Writes `~/.claudesync/config.json` (per-PC; never synced) +- Writes `~/.claudesync/.state/manifest.json` (per-PC; never synced) +- Pushes the seed commit if this is the first PC + +On a second or third PC where the remote already has content from another +workstation, init handles collisions: + +- `MEMORY.md` present on both sides → semantic merge (sections from both PCs are + unioned) +- Other memory files differing on both sides → mirror copy is preserved as + `.from-remote-` for manual review; the local version is taken + +### 4. Install and start the daemon (each PC) + +```sh +claude-memsync install +claude-memsync start +claude-memsync status +``` + +`install` registers the daemon to auto-start when you log in: + +- **Windows**: drops `claude-memsync.vbs` in + `%APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup\`. The script launches + the daemon hidden and detached via `WScript.Shell.Run`, so no console window is + left behind at logon. No admin required. (Task Scheduler was investigated; it + requires admin even for per-user logon tasks, so the Startup folder is the + simpler path.) +- **Linux**: writes a systemd user unit at + `~/.config/systemd/user/claude-memsync.service` and enables it. +- **macOS**: writes a launchd plist at + `~/Library/LaunchAgents/claude-memsync.plist`. + +All three run as your logged-in user, with full access to your credential vault +and SSH keys. None require root. + +## Lifecycle commands + +| Command | What it does | +|---|---| +| `claude-memsync init --remote ` | One-time bootstrap. | +| `claude-memsync install` | Register for auto-start at next logon. | +| `claude-memsync uninstall` | Remove the auto-start hook. | +| `claude-memsync start` | Start the daemon now. | +| `claude-memsync stop` | Stop the running daemon. | +| `claude-memsync status` | Print `running (pid N)` or `stopped`. | +| `claude-memsync run` | Run in the foreground (for debugging). | +| `claude-memsync distill` | Rebuild the distilled-memory catalog index (`--prune`, `--dry-run`). See [distilling-memories.md](distilling-memories.md). | +| `claude-memsync version` | Print version. | + +## How it works + +``` +~/.claude/projects//memory/ (Claude reads + writes here — authoritative) + │ + │ fsnotify watcher, 3s debounce + ▼ +~/.claudesync/projects//memory/ (mirror — git work-tree) + │ + │ git add -A, commit, pull --rebase, push + ▼ + (private GitHub repo) + │ + │ ls-remote check on 1h timer (or after local push); + │ full pull only when remote SHA actually moved + ▼ + propagate pull-driven changes back to ~/.claude/projects/... +``` + +Idle ticks cost a single `git ls-remote` round-trip — no pull, no push, no merge +driver invocation. The full pull/push cycle runs only when origin's branch SHA +differs from the local `refs/remotes/origin/` or when there are unpushed +commits. + +- The daemon never edits files in `~/.claude/projects/...` while a write is in + progress — it copies into the mirror first, then any inbound changes from a + `git pull` are written back atomically. +- The custom merge driver is registered in the local repo config and invoked by + git via `.gitattributes` (`MEMORY.md merge=claude-memory-index`). +- Conflicts in non-`MEMORY.md` files surface as standard git conflict markers in + the affected file. Rare in practice because each memory file has a unique + filename per topic. + +## What gets synced (and what doesn't) + +**Synced:** every file directly under +`~/.claude/projects//memory/` on any PC. + +**Not synced:** + +- `~/.claude/CLAUDE.md` (your global instructions) +- `~/.claude/agents/`, `~/.claude/commands/`, `~/.claude/skills/` +- `~/.claude/sessions/`, `~/.claude/todos/`, `~/.claude/cache/`, + `~/.claude/history.jsonl`, `~/.claude/settings.json`, etc. +- `~/.claudesync/config.json` — per-PC (paths embed your machine layout) +- `~/.claudesync/.state/manifest.json` — per-PC (delete-detection state) +- `~/.claudesync/daemon.pid` — runtime state +- `~/.claudesync/distilled/DISTILLED.md` — derived index, regenerated per-PC + (the distilled `.md` entry files themselves **are** synced) + +If you want any of the additional `~/.claude/` content synced, that's a future +enhancement. + +## Deletes + +Deletes propagate across PCs. The daemon keeps a per-PC manifest +(`~/.claudesync/.state/manifest.json`) listing which Claude-side memory files +were present at the last successful sync. On each reconcile pass, a file that's +in the mirror, missing from Claude, **and** was in the manifest is treated as a +user delete and removed from the mirror; the next push propagates it. A file +that's in the mirror, missing from Claude, but **not** in the manifest is treated +as an inbound new file from another PC and copied into Claude. + +This means: + +- Delete a memory while the daemon is running → propagates immediately (watcher + catches it). +- Delete a memory while the daemon is stopped → propagates on next startup + (manifest-driven reconcile). +- First-ever run on a PC has no manifest, so deletes can't be inferred from prior + state. The daemon takes the safe path: never infer deletes, bring everything + together. Subsequent runs work normally. + +## On-disk layout + +``` +~/.claude/projects//memory/ # what Claude reads + writes + ├── MEMORY.md + └── .md ... + +~/.claudesync/ # owned by the daemon + ├── config.json # per-PC (gitignored) + ├── daemon.pid # per-PC (gitignored) + ├── .git/ # the sync repo + ├── .gitattributes # MEMORY.md merge=claude-memory-index + ├── .gitignore # excludes config.json, .state/, etc. + ├── .state/manifest.json # per-PC delete-detection state + ├── distilled/ # shared distilled-memory catalog + │ ├── DISTILLED.md # derived index (gitignored) + │ └── .md ... # distilled entries (synced) + └── projects//memory/... # git work-tree mirror +``` + +## Auth notes + +The daemon shells out to the system `git` binary, so it uses whatever auth is +configured in your environment: + +- **HTTPS with Git Credential Manager** (`gh auth login` on Windows populates + this): zero extra setup. +- **SSH**: ensure `ssh-agent` is running for your user session and your key is + loaded. On Linux, `systemctl --enable-linger ` keeps the user instance + running across logoffs if you want sync activity while not logged in. +- **PAT in URL** (`https://@github.com/...`): works but the token ends up + in `~/.claudesync/.git/config`. Not recommended. + +Test before installing the daemon: + +```sh +git -C ~/.claudesync push +``` + +If that works without prompting, the daemon will too. + +## Limitations / known issues + +- **Path consistency required**: Claude derives the per-project memory directory + name by escaping the project's absolute path. PCs that open the same repo at + different paths (e.g. `C:\src\foo` vs. `D:\dev\foo`) will see them as different + projects. +- **No live conflict UI**: when the merge driver emits actual conflict markers + (rare; only on overlapping line edits within the same `MEMORY.md` section + body), the file is committed and pushed with markers. The next time you edit on + that PC you'll see them; resolve by hand and save. +- **Auto-start runs while logged in**: the daemon stops when the user logs off. + Acceptable since memories don't change while you're away. For 24/7 sync, + register a system-wide service manually or enable systemd lingering. +- **Stop is a hard kill**: by design — git operations are atomic per command, so + we don't risk corruption. If a `.git/index.lock` is left behind by an unrelated + git crash, remove it manually. diff --git a/docs/distilling-memories.md b/docs/distilling-memories.md new file mode 100644 index 0000000..7745c97 --- /dev/null +++ b/docs/distilling-memories.md @@ -0,0 +1,199 @@ +# Distilling environment memories across projects + +A practical guide to extracting the lessons Claude has learned in one project +and reusing them everywhere. + +## What this is for + +As you work with Claude Code in a project, it accumulates memories — small notes +about how to do things in your environment. Over months, some of those become +genuinely valuable and **transferable**: how your shell behaves, CLI flags that +don't exist, toolchain quirks, your standing preferences. But they're trapped in +one project's memory. Start a new repo and Claude re-learns them from scratch. + +Distilling fixes that. You **distill** the transferable lessons out of a project +into a shared catalog, then **apply** them into any other project — existing or +brand new — so Claude starts out already knowing them. + +What belongs in the catalog (environment-level): + +- "On MINGW64, `kubectl cp` is broken; pipe with `cat … | kubectl exec -i`." +- "`gh issue assign --self` doesn't exist; use `gh issue edit --add-assignee`." +- Your mission, role, or standing preferences. + +What does **not** (project-specific — stays where it is): + +- "This repo requires squash merges." +- A service's deploy steps, routing logic, or anything naming this repo's + components. + +## How it works (the short version) + +Three pieces, splitting judgment from mechanics: + +| Piece | Kind | Job | +|-------|------|-----| +| `/distill` | Claude skill | Decide which memories are environment-level, rewrite them to be project-neutral, write catalog entries, tag the originals. | +| `claude-memsync distill` | Go CLI / daemon | Rebuild the catalog index, prune stale entries, report the worklist. No judgment — pure mechanics. | +| `/distill-apply` | Claude skill | Copy chosen catalog entries into the current project's memory. | + +The catalog lives at `~/.claudesync/distilled/` — one `.md` file per +lesson, plus a generated `DISTILLED.md` index. Because it sits inside the +`claude-memsync` work-tree, entries sync across all your workstations +automatically. + +## One-time setup + +You only do this once per machine. + +1. **Build and install the binaries** (see + [claude-memsync.md](claude-memsync.md) for the full sync setup), and make sure + `claude-memsync` is **on your PATH**, with a `claude-memsync init` already run. + + The PATH part matters: `/distill` calls `claude-memsync distill` to rebuild + the catalog index. If the binary isn't found, the skill degrades gracefully + (it still writes entries and lets the daemon rebuild the index later) — but to + get the index refreshed immediately, `claude-memsync` must resolve. Move the + binary somewhere on PATH (`~/.local/bin`, `C:\Program Files\claude-memsync\`, + …), or if you're running it straight out of a dev checkout, drop a small + forwarding shim on PATH, e.g. `~/.local/bin/claude-memsync`: + + ```sh + #!/usr/bin/env bash + exec "/path/to/checkout/bin/claude-memsync.exe" "$@" + ``` + + (`chmod +x` it. A shim avoids a stale duplicate — it always runs your current + build.) + +2. **Install the skills** to your user scope so they work in every project: + + ```sh + cp -r skills/distill skills/distill-apply ~/.claude/skills/ + ``` + +3. **Allow the skills to use the catalog without permission prompts.** + `claude-memsync init` prints the rule; add it to `~/.claude/settings.json`: + + ```jsonc + { "permissions": { "allow": [ + "Read(~/.claudesync/distilled/**)", + "Write(~/.claudesync/distilled/**)" + ] } } + ``` + + Without this, the skills still work but Claude will ask permission each time + it reads or writes the catalog (expected in default, non-bypass mode). + +## Workflow 1 — distill lessons out of a project + +Do this in a project where Claude has learned things worth keeping (e.g. the one +you've worked in for months). + +1. Open Claude Code in that project. +2. Run the skill: + + ``` + /distill + ``` + +3. Claude reviews the project's memories, decides which are environment-level, + rewrites them to drop project specifics, writes them into the catalog, and + tags the originals `scope: environment` (so it won't re-process them next + time). It then runs `claude-memsync distill` to refresh the index. +4. Claude reports what it promoted, what it left as project-specific and why, + and anything still pending. Review its choices — if it promoted something too + project-specific, tell it to revert that one. + +You can re-run `/distill` any time; it skips already-classified memories, so it +only looks at what's new. + +## Workflow 2 — apply lessons to another project + +Do this **from inside the project you want to bring up to speed** — including a +brand-new one. + +1. Open Claude Code in the target project. +2. Run: + + ``` + /distill-apply + ``` + +3. Claude refreshes the catalog index, lists the entries not already present in + this project, and lets you pick (or apply all). It copies the chosen entries + into this project's memory and adds pointers to its `MEMORY.md`. +4. From now on, Claude in this project starts out knowing those lessons. + +To enrich several projects, run `/distill-apply` in each. + +## Keeping the catalog fresh + +If the `claude-memsync` daemon is running, it rebuilds the catalog index +automatically after every sync — including entries that arrive from your other +workstations. You rarely need to think about it. + +If you're not running the daemon, or want to refresh by hand: + +```sh +claude-memsync distill # rebuild DISTILLED.md, report worklist +claude-memsync distill --dry-run # show what would change, write nothing +claude-memsync distill --prune # also remove entries whose source is gone +``` + +The CLI prints: + +- how many entries are in the catalog, +- **pending** memories — tagged `scope: environment` but not yet written to the + catalog (run `/distill` to generalize them), +- **conflicts** — the same lesson distilled differently in two places (resolve + in `/distill`). + +## What a catalog entry looks like + +```markdown +--- +name: mingw-kubectl-file-transfer +description: On MINGW64 pipe files into kubectl exec with cat and -i +metadata: + type: feedback + scope: environment + originProject: S--src-rdl-rockbot + originFile: feedback_mingw_kubectl.md +--- + +On MINGW64 (Git Bash on Windows), `kubectl cp` and `< file` stdin redirects are +both broken. Pipe instead: `cat file | kubectl exec -i -- sh -c 'cat > /path'`. +``` + +`scope: environment` is what marks a memory for the catalog. You normally never +write it by hand — `/distill` decides and writes it. The daemon and CLI use it +to know what belongs. + +## Troubleshooting + +- **Claude keeps asking permission to read/write the catalog.** The allow-rule + in setup step 3 isn't in place. Add it to `~/.claude/settings.json`. +- **`/distill` promoted something too project-specific.** Tell Claude to remove + that entry from `~/.claudesync/distilled/` and un-tag the original; or delete + the `.md` and run `claude-memsync distill --prune`. +- **An entry shows up as a conflict.** The same lesson was distilled differently + on two machines or from two projects. Run `/distill` and ask Claude to merge + them into one generalized entry. +- **A distilled lesson no longer applies.** Delete its `.md` from the + catalog (or remove the `scope: environment` tag from the source and run + `claude-memsync distill --prune`). +- **`DISTILLED.md` looks stale.** It's a derived file, regenerated locally; run + `claude-memsync distill` to rebuild it. It is intentionally not synced (each PC + regenerates its own to avoid merge conflicts on the generated table). + +## Mental model + +- **Entry files are the source of truth** and sync across your machines. +- **`DISTILLED.md` is derived** — regenerated locally, never synced. +- **`/distill` is the brain** — the only place classification and rewriting + happen. +- **`claude-memsync distill` is the muscle** — it indexes and prunes, never + judges. +- **`/distill-apply` is the courier** — it moves lessons into a project, one + project at a time. diff --git a/internal/config/config.go b/internal/config/config.go index ce16e99..1791f62 100644 --- a/internal/config/config.go +++ b/internal/config/config.go @@ -30,6 +30,11 @@ type Config struct { // Defaults to a sibling of the running daemon. MergeDriverPath string `json:"mergeDriverPath"` + // DistilledDir holds the shared catalog of environment-level memories. + // Empty means /distilled (see DistilledPath). It lives inside the + // work-tree so the daemon's git sync carries it across workstations. + DistilledDir string `json:"distilledDir,omitempty"` + // DebounceMs is how long to wait after a file change before committing. DebounceMs int `json:"debounceMs"` @@ -73,6 +78,15 @@ func (c Config) Path() string { return filepath.Join(c.SyncDir, "config.json") } +// DistilledPath returns the distilled-memory catalog directory, defaulting to +// /distilled when DistilledDir is unset. +func (c Config) DistilledPath() string { + if c.DistilledDir != "" { + return c.DistilledDir + } + return filepath.Join(c.SyncDir, "distilled") +} + // Load reads the config from path. The path is typically Config.Path() but // callers can supply their own (e.g. for tests). func Load(path string) (Config, error) { diff --git a/internal/distill/distill.go b/internal/distill/distill.go new file mode 100644 index 0000000..ceeb205 --- /dev/null +++ b/internal/distill/distill.go @@ -0,0 +1,479 @@ +// Package distill maintains the shared catalog of environment-level Claude Code +// memories. +// +// Individual memory files live per-project under +// ~/.claude/projects//memory/. Some lessons are transferable across +// projects (shell/OS quirks, CLI gotchas, user identity) rather than bound to +// one repo. The /distill skill classifies and generalizes those, writing one +// catalog entry per lesson into the distilled directory (default +// ~/.claudesync/distilled/) and tagging the originating memory with a marker +// (default: scope: environment). +// +// This package is the mechanical half: it cannot classify (no model in the +// loop), it only indexes and reconciles what the skill produced. BuildIndex +// regenerates the human-readable DISTILLED.md index from the entry files; +// Reconcile prunes entries whose source memory lost the marker or vanished; +// analyzeSources surfaces a worklist of tagged-but-not-yet-distilled memories. +// +// The distilled directory sits inside the claude-memsync work-tree, so the +// daemon's existing `git add -A` carries the catalog across workstations with +// no extra transport. +package distill + +import ( + "crypto/sha256" + "encoding/hex" + "errors" + "fmt" + "io/fs" + "os" + "path/filepath" + "sort" + "strings" +) + +// IndexFileName is the generated catalog index written into the distilled dir. +const IndexFileName = "DISTILLED.md" + +// Marker is the frontmatter key/value that gates a memory into the catalog. +type Marker struct { + Key string + Value string +} + +// DefaultMarker is the marker the /distill skill writes onto promoted memories. +var DefaultMarker = Marker{Key: "scope", Value: "environment"} + +// Options configures a distill run. +type Options struct { + // ProjectsDir is the root of Claude's per-project memory tree + // (default: ~/.claude/projects). Used to analyze and reconcile sources. + ProjectsDir string + // DistilledDir holds the catalog entry files and the generated index + // (default: ~/.claudesync/distilled). + DistilledDir string + // Marker gates which source memories belong in the catalog. Zero value + // means DefaultMarker. + Marker Marker +} + +func (o *Options) applyDefaults() { + if o.Marker.Key == "" { + o.Marker = DefaultMarker + } +} + +// Entry is a parsed catalog entry (one distilled lesson). +type Entry struct { + Name string + Description string + Type string + Scope string + OriginProject string + OriginFile string + BodyHash string // short hash of the trimmed body, for dedupe/conflict checks + Path string // absolute path to the entry file +} + +// Origin identifies a source memory found while scanning the projects tree. +type Origin struct { + Project string // directory name + File string // file name within memory/ + Path string // absolute path + Name string // frontmatter name (falls back to the file stem) +} + +// Conflict records a memory name carried by multiple sources with divergent +// content. The mechanical layer never merges prose; the /distill skill resolves +// these semantically. +type Conflict struct { + Name string + Sources []string // "project/file" labels +} + +// Result summarizes a distill run. +type Result struct { + Indexed int // catalog entries written to the index + Pruned int // stale catalog entries removed (Reconcile) + Pending []Origin // marked source memories with no catalog entry yet + Conflicts []Conflict // same name, divergent content across sources +} + +// Run reconciles (when prune is set) and then rebuilds the index. This is the +// entry point used by `claude-memsync distill`. +func Run(opts Options, prune bool) (Result, error) { + opts.applyDefaults() + var pruned int + if prune { + r, err := Reconcile(opts) + if err != nil { + return Result{}, err + } + pruned = r.Pruned + } + res, err := BuildIndex(opts) + if err != nil { + return res, err + } + res.Pruned = pruned + return res, nil +} + +// Preview reports what BuildIndex would record (entry count, worklist, +// conflicts) without writing the index or touching any files. Used by +// `claude-memsync distill --dry-run`. +func Preview(opts Options) (Result, error) { + opts.applyDefaults() + entries, err := scanCatalog(opts.DistilledDir) + if err != nil { + return Result{}, fmt.Errorf("scan catalog: %w", err) + } + return analyze(opts, entries), nil +} + +// BuildIndex parses every entry file in DistilledDir, regenerates the +// DISTILLED.md index, and reports the source worklist and any conflicts. +func BuildIndex(opts Options) (Result, error) { + opts.applyDefaults() + + entries, err := scanCatalog(opts.DistilledDir) + if err != nil { + return Result{}, fmt.Errorf("scan catalog: %w", err) + } + if err := writeIndex(opts.DistilledDir, entries); err != nil { + return Result{}, fmt.Errorf("write index: %w", err) + } + return analyze(opts, entries), nil +} + +// analyze derives the Result (counts, conflicts, worklist) from already-scanned +// catalog entries. Read-only. +func analyze(opts Options, entries []Entry) Result { + res := Result{Indexed: len(entries)} + // Catalog-level conflict: two entry files declaring the same name. + res.Conflicts = append(res.Conflicts, catalogConflicts(entries)...) + // Source-level worklist + conflicts (best-effort; skipped if the projects + // tree is unreadable, e.g. on a consume-only workstation). + pending, conflicts := analyzeSources(opts, entries) + res.Pending = pending + res.Conflicts = append(res.Conflicts, conflicts...) + return res +} + +// Reconcile removes catalog entries whose originating memory no longer carries +// the marker or no longer exists. It is conservative: if the projects tree is +// not visible at all, it prunes nothing (avoids wiping the catalog on a machine +// that only consumes it). +func Reconcile(opts Options) (Result, error) { + opts.applyDefaults() + var res Result + + if _, err := os.Stat(opts.ProjectsDir); err != nil { + return res, nil // sources not visible here; never prune blind + } + + entries, err := scanCatalog(opts.DistilledDir) + if err != nil { + return res, fmt.Errorf("scan catalog: %w", err) + } + for _, e := range entries { + if e.OriginProject == "" || e.OriginFile == "" { + continue // hand-authored entry with no source; leave it alone + } + src := filepath.Join(opts.ProjectsDir, e.OriginProject, "memory", e.OriginFile) + content, err := os.ReadFile(src) + stale := false + switch { + case errors.Is(err, fs.ErrNotExist): + stale = true + case err != nil: + continue // transient read error; don't prune on uncertainty + default: + meta, _, _ := parseFrontmatter(string(content)) + stale = meta[opts.Marker.Key] != opts.Marker.Value + } + if stale { + if err := os.Remove(e.Path); err != nil { + return res, fmt.Errorf("prune %s: %w", e.Path, err) + } + res.Pruned++ + } + } + return res, nil +} + +// scanCatalog loads every entry file in dir, sorted by name. A missing dir is +// not an error (returns no entries). +func scanCatalog(dir string) ([]Entry, error) { + files, err := os.ReadDir(dir) + if errors.Is(err, fs.ErrNotExist) { + return nil, nil + } + if err != nil { + return nil, err + } + var entries []Entry + for _, f := range files { + if !isEntryFile(f) { + continue + } + e, err := loadEntry(filepath.Join(dir, f.Name())) + if err != nil { + return nil, err + } + entries = append(entries, e) + } + sort.Slice(entries, func(i, j int) bool { return entries[i].Name < entries[j].Name }) + return entries, nil +} + +func loadEntry(path string) (Entry, error) { + b, err := os.ReadFile(path) + if err != nil { + return Entry{}, err + } + meta, body, _ := parseFrontmatter(string(b)) + name := meta["name"] + if name == "" { + name = strings.TrimSuffix(filepath.Base(path), ".md") + } + return Entry{ + Name: name, + Description: meta["description"], + Type: meta["type"], + Scope: meta["scope"], + OriginProject: meta["originProject"], + OriginFile: meta["originFile"], + BodyHash: bodyHash(body), + Path: path, + }, nil +} + +// analyzeSources walks the projects tree for memories carrying the marker. It +// returns the ones with no catalog entry yet (pending) and any name carried by +// multiple sources with divergent bodies (conflicts). Unreadable trees yield +// empty results rather than errors. +func analyzeSources(opts Options, catalog []Entry) ([]Origin, []Conflict) { + // A catalog entry claims its source by provenance (project/file), not by + // name: the catalog slug is normalized and deliberately differs from the + // source memory's human-readable name, so matching on name would mis-report + // every renamed entry as pending. + claimed := make(map[string]bool, len(catalog)) + for _, e := range catalog { + if e.OriginProject != "" && e.OriginFile != "" { + claimed[e.OriginProject+"/"+e.OriginFile] = true + } + } + + dirs, err := os.ReadDir(opts.ProjectsDir) + if err != nil { + return nil, nil + } + + type src struct { + origin Origin + hash string + } + var all []src + byName := map[string][]src{} + for _, d := range dirs { + if !d.IsDir() { + continue + } + memDir := filepath.Join(opts.ProjectsDir, d.Name(), "memory") + files, err := os.ReadDir(memDir) + if err != nil { + continue + } + for _, f := range files { + if !isEntryFile(f) { + continue + } + path := filepath.Join(memDir, f.Name()) + content, err := os.ReadFile(path) + if err != nil { + continue + } + meta, body, _ := parseFrontmatter(string(content)) + if meta[opts.Marker.Key] != opts.Marker.Value { + continue + } + name := meta["name"] + if name == "" { + name = strings.TrimSuffix(f.Name(), ".md") + } + s := src{ + origin: Origin{Project: d.Name(), File: f.Name(), Path: path, Name: name}, + hash: bodyHash(body), + } + all = append(all, s) + byName[name] = append(byName[name], s) + } + } + + // Pending: a tagged source whose provenance no catalog entry claims yet. + var pending []Origin + for _, s := range all { + if !claimed[s.origin.Project+"/"+s.origin.File] { + pending = append(pending, s.origin) + } + } + sort.Slice(pending, func(i, j int) bool { + if pending[i].Project != pending[j].Project { + return pending[i].Project < pending[j].Project + } + return pending[i].File < pending[j].File + }) + + // Conflicts: the same name carried by multiple sources with divergent bodies. + names := make([]string, 0, len(byName)) + for n := range byName { + names = append(names, n) + } + sort.Strings(names) + var conflicts []Conflict + for _, n := range names { + ss := byName[n] + if len(ss) < 2 { + continue + } + hashes := map[string]bool{} + var labels []string + for _, s := range ss { + hashes[s.hash] = true + labels = append(labels, s.origin.Project+"/"+s.origin.File) + } + if len(hashes) > 1 { + conflicts = append(conflicts, Conflict{Name: n, Sources: labels}) + } + } + return pending, conflicts +} + +func catalogConflicts(entries []Entry) []Conflict { + byName := map[string][]Entry{} + for _, e := range entries { + byName[e.Name] = append(byName[e.Name], e) + } + var out []Conflict + for name, es := range byName { + if len(es) < 2 { + continue + } + hashes := map[string]bool{} + var labels []string + for _, e := range es { + hashes[e.BodyHash] = true + labels = append(labels, filepath.Base(e.Path)) + } + if len(hashes) > 1 { + out = append(out, Conflict{Name: name, Sources: labels}) + } + } + sort.Slice(out, func(i, j int) bool { return out[i].Name < out[j].Name }) + return out +} + +func writeIndex(dir string, entries []Entry) error { + var b strings.Builder + b.WriteString("# Distilled environment memories\n\n") + b.WriteString("\n\n") + if len(entries) == 0 { + b.WriteString("_No distilled memories yet._\n") + } else { + b.WriteString("| Name | Description | Type | Origin |\n") + b.WriteString("|------|-------------|------|--------|\n") + for _, e := range entries { + b.WriteString(fmt.Sprintf("| [%s](%s) | %s | %s | %s |\n", + e.Name, + filepath.Base(e.Path), + cellEscape(e.Description), + dash(e.Type), + dash(e.OriginProject), + )) + } + } + if err := os.MkdirAll(dir, 0700); err != nil { + return err + } + return os.WriteFile(filepath.Join(dir, IndexFileName), []byte(b.String()), 0600) +} + +// parseFrontmatter extracts a flat key/value map from leading YAML frontmatter +// and returns the remaining body. It is deliberately tolerant: it accepts both +// the flat `type: feedback` form (older memories) and keys nested one level +// under `metadata:` (current schema), flattening both into the same map. Inline +// objects and deeper nesting are ignored. Content without frontmatter is +// returned verbatim as the body with an empty map. +func parseFrontmatter(content string) (map[string]string, string, error) { + meta := map[string]string{} + if !strings.HasPrefix(content, "---") { + return meta, content, nil + } + lines := strings.Split(content, "\n") + if strings.TrimRight(lines[0], "\r") != "---" { + return meta, content, nil + } + end := -1 + for i := 1; i < len(lines); i++ { + if strings.TrimRight(lines[i], "\r") == "---" { + end = i + break + } + } + if end < 0 { + return meta, content, nil // unterminated; treat as no frontmatter + } + for i := 1; i < end; i++ { + line := strings.TrimSpace(strings.TrimRight(lines[i], "\r")) + if line == "" || strings.HasPrefix(line, "#") { + continue + } + colon := strings.Index(line, ":") + if colon < 0 { + continue + } + key := strings.TrimSpace(line[:colon]) + val := strings.TrimSpace(line[colon+1:]) + if key == "metadata" && val == "" { + continue // container line; its children are flattened in + } + val = strings.Trim(val, `"'`) + if val != "" { + meta[key] = val + } + } + body := strings.TrimLeft(strings.Join(lines[end+1:], "\n"), "\n") + return meta, body, nil +} + +func isEntryFile(f fs.DirEntry) bool { + if f.IsDir() { + return false + } + n := f.Name() + if strings.HasPrefix(n, ".") || !strings.HasSuffix(n, ".md") { + return false + } + if strings.Contains(n, ".tmp.") { + return false // leftover merge-driver temp files + } + return n != "MEMORY.md" && n != IndexFileName +} + +func bodyHash(body string) string { + sum := sha256.Sum256([]byte(strings.TrimSpace(body))) + return hex.EncodeToString(sum[:8]) +} + +func cellEscape(s string) string { + s = strings.ReplaceAll(s, "|", "\\|") + return strings.ReplaceAll(s, "\n", " ") +} + +func dash(s string) string { + if s == "" { + return "—" + } + return s +} diff --git a/internal/distill/distill_test.go b/internal/distill/distill_test.go new file mode 100644 index 0000000..92eada4 --- /dev/null +++ b/internal/distill/distill_test.go @@ -0,0 +1,221 @@ +package distill + +import ( + "os" + "path/filepath" + "strings" + "testing" +) + +func writeFile(t *testing.T, path, content string) { + t.Helper() + if err := os.MkdirAll(filepath.Dir(path), 0700); err != nil { + t.Fatal(err) + } + if err := os.WriteFile(path, []byte(content), 0600); err != nil { + t.Fatal(err) + } +} + +func TestParseFrontmatter(t *testing.T) { + cases := []struct { + name string + content string + wantMeta map[string]string + wantBody string + checkBody bool + }{ + { + name: "flat keys (older schema)", + content: "---\nname: foo\ntype: feedback\nscope: environment\n---\nbody line\n", + wantMeta: map[string]string{"name": "foo", "type": "feedback", "scope": "environment"}, + wantBody: "body line\n", checkBody: true, + }, + { + name: "nested metadata (current schema)", + content: "---\nname: bar\ndescription: a desc\nmetadata:\n type: user\n scope: environment\n---\n\nthe body\n", + wantMeta: map[string]string{"name": "bar", "description": "a desc", "type": "user", "scope": "environment"}, + wantBody: "the body\n", checkBody: true, + }, + { + name: "no frontmatter", + content: "just a body\n", + wantMeta: map[string]string{}, + wantBody: "just a body\n", checkBody: true, + }, + { + name: "unterminated frontmatter is treated as body", + content: "---\nname: oops\nno closing fence\n", + wantMeta: map[string]string{}, + }, + { + name: "quoted values stripped", + content: "---\nname: \"quoted\"\n---\nx\n", + wantMeta: map[string]string{"name": "quoted"}, + }, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + meta, body, err := parseFrontmatter(tc.content) + if err != nil { + t.Fatal(err) + } + for k, v := range tc.wantMeta { + if meta[k] != v { + t.Errorf("meta[%q] = %q, want %q", k, meta[k], v) + } + } + if len(meta) != len(tc.wantMeta) { + t.Errorf("meta = %v, want exactly %v", meta, tc.wantMeta) + } + if tc.checkBody && body != tc.wantBody { + t.Errorf("body = %q, want %q", body, tc.wantBody) + } + }) + } +} + +// catalogEntry builds an entry file body with nested-metadata frontmatter. +func catalogEntry(name, desc, typ, originProject, originFile, body string) string { + return strings.Join([]string{ + "---", + "name: " + name, + "description: " + desc, + "metadata:", + " type: " + typ, + " scope: environment", + " originProject: " + originProject, + " originFile: " + originFile, + "---", + "", + body, + "", + }, "\n") +} + +func TestBuildIndexWritesSortedIndex(t *testing.T) { + dir := t.TempDir() + distilled := filepath.Join(dir, "distilled") + writeFile(t, filepath.Join(distilled, "zeta.md"), catalogEntry("zeta", "Z lesson", "feedback", "P1", "z.md", "z body")) + writeFile(t, filepath.Join(distilled, "alpha.md"), catalogEntry("alpha", "A lesson", "user", "P2", "a.md", "a body")) + // noise that must be ignored + writeFile(t, filepath.Join(distilled, "alpha.md.tmp.123.abc"), "junk") + writeFile(t, filepath.Join(distilled, ".hidden.md"), "junk") + + res, err := BuildIndex(Options{DistilledDir: distilled, ProjectsDir: filepath.Join(dir, "nope")}) + if err != nil { + t.Fatal(err) + } + if res.Indexed != 2 { + t.Fatalf("Indexed = %d, want 2", res.Indexed) + } + idx, err := os.ReadFile(filepath.Join(distilled, IndexFileName)) + if err != nil { + t.Fatal(err) + } + s := string(idx) + ai := strings.Index(s, "alpha") + zi := strings.Index(s, "zeta") + if ai < 0 || zi < 0 || ai > zi { + t.Errorf("index not sorted alpha-before-zeta:\n%s", s) + } + if !strings.Contains(s, "[alpha](alpha.md)") { + t.Errorf("index missing entry link:\n%s", s) + } +} + +func TestBuildIndexCatalogConflict(t *testing.T) { + dir := t.TempDir() + distilled := filepath.Join(dir, "distilled") + writeFile(t, filepath.Join(distilled, "dup-a.md"), catalogEntry("dup", "first", "feedback", "P1", "x.md", "body one")) + writeFile(t, filepath.Join(distilled, "dup-b.md"), catalogEntry("dup", "second", "feedback", "P2", "y.md", "body TWO differs")) + + res, err := BuildIndex(Options{DistilledDir: distilled}) + if err != nil { + t.Fatal(err) + } + if len(res.Conflicts) != 1 || res.Conflicts[0].Name != "dup" { + t.Fatalf("Conflicts = %+v, want one for 'dup'", res.Conflicts) + } +} + +// sourceMemory builds a project-side memory file carrying the marker. +func sourceMemory(name string, marked bool, body string) string { + scope := "" + if marked { + scope = " scope: environment\n" + } + return "---\nname: " + name + "\ndescription: d\nmetadata:\n type: feedback\n" + scope + "---\n\n" + body + "\n" +} + +func TestBuildIndexPendingWorklist(t *testing.T) { + dir := t.TempDir() + projects := filepath.Join(dir, "projects") + distilled := filepath.Join(dir, "distilled") + + // A marked memory already in the catalog. The source keeps a human-readable + // name while the catalog entry uses a normalized slug — they must still match + // by provenance (project/file), NOT by name, or the entry mis-reports pending. + writeFile(t, filepath.Join(projects, "P1", "memory", "in-catalog.md"), sourceMemory("In Catalog, Human Name", true, "x")) + writeFile(t, filepath.Join(distilled, "in-catalog.md"), catalogEntry("in-catalog-slug", "d", "feedback", "P1", "in-catalog.md", "x")) + // A marked memory NOT yet in the catalog -> pending. + writeFile(t, filepath.Join(projects, "P1", "memory", "pending.md"), sourceMemory("pending-lesson", true, "y")) + // An unmarked memory -> ignored. + writeFile(t, filepath.Join(projects, "P2", "memory", "project-only.md"), sourceMemory("project-only", false, "z")) + + res, err := BuildIndex(Options{DistilledDir: distilled, ProjectsDir: projects}) + if err != nil { + t.Fatal(err) + } + if len(res.Pending) != 1 || res.Pending[0].Name != "pending-lesson" { + t.Fatalf("Pending = %+v, want one for 'pending-lesson'", res.Pending) + } +} + +func TestReconcilePrunesStaleEntries(t *testing.T) { + dir := t.TempDir() + projects := filepath.Join(dir, "projects") + distilled := filepath.Join(dir, "distilled") + + // Entry whose source still carries the marker -> kept. + writeFile(t, filepath.Join(projects, "P1", "memory", "live.md"), sourceMemory("live", true, "x")) + writeFile(t, filepath.Join(distilled, "live.md"), catalogEntry("live", "d", "feedback", "P1", "live.md", "x")) + // Entry whose source lost the marker -> pruned. + writeFile(t, filepath.Join(projects, "P1", "memory", "untagged.md"), sourceMemory("untagged", false, "y")) + writeFile(t, filepath.Join(distilled, "untagged.md"), catalogEntry("untagged", "d", "feedback", "P1", "untagged.md", "y")) + // Entry whose source file is gone -> pruned. + writeFile(t, filepath.Join(distilled, "gone.md"), catalogEntry("gone", "d", "feedback", "P1", "missing.md", "z")) + + res, err := Reconcile(Options{DistilledDir: distilled, ProjectsDir: projects}) + if err != nil { + t.Fatal(err) + } + if res.Pruned != 2 { + t.Fatalf("Pruned = %d, want 2", res.Pruned) + } + if _, err := os.Stat(filepath.Join(distilled, "live.md")); err != nil { + t.Errorf("live.md should have survived: %v", err) + } + for _, gone := range []string{"untagged.md", "gone.md"} { + if _, err := os.Stat(filepath.Join(distilled, gone)); !os.IsNotExist(err) { + t.Errorf("%s should have been pruned", gone) + } + } +} + +func TestReconcileNeverPrunesBlindWhenSourcesInvisible(t *testing.T) { + dir := t.TempDir() + distilled := filepath.Join(dir, "distilled") + writeFile(t, filepath.Join(distilled, "live.md"), catalogEntry("live", "d", "feedback", "P1", "live.md", "x")) + + res, err := Reconcile(Options{DistilledDir: distilled, ProjectsDir: filepath.Join(dir, "does-not-exist")}) + if err != nil { + t.Fatal(err) + } + if res.Pruned != 0 { + t.Fatalf("Pruned = %d, want 0 when projects tree is missing", res.Pruned) + } + if _, err := os.Stat(filepath.Join(distilled, "live.md")); err != nil { + t.Errorf("live.md should not be pruned blind: %v", err) + } +} diff --git a/internal/sync/ignore_test.go b/internal/sync/ignore_test.go new file mode 100644 index 0000000..c57100f --- /dev/null +++ b/internal/sync/ignore_test.go @@ -0,0 +1,32 @@ +package sync + +import "testing" + +func TestShouldIgnoreFile(t *testing.T) { + ignore := []string{ + "", // empty + ".hidden", // dot-prefixed + ".gitignore", // dot-prefixed + "MEMORY.md.tmp", // trailing .tmp + "MEMORY.md.tmp.7368.beeb3e905e2a", // .tmp.. + "feedback_mingw_kubectl.md.tmp.27244.fac4861e524", // same, real-world shape + "release_process.md.tmp.47284.4cad.from-remote-1", // temp + conflict-backup suffix + } + for _, n := range ignore { + if !shouldIgnoreFile(n) { + t.Errorf("shouldIgnoreFile(%q) = false, want true", n) + } + } + + keep := []string{ + "MEMORY.md", + "feedback_mingw_kubectl.md", + "reference-meai-cached-tokens.md", + "project_routing_log_missing_tokens.md", + } + for _, n := range keep { + if shouldIgnoreFile(n) { + t.Errorf("shouldIgnoreFile(%q) = true, want false", n) + } + } +} diff --git a/internal/sync/loop.go b/internal/sync/loop.go index c20c8eb..ae74a46 100644 --- a/internal/sync/loop.go +++ b/internal/sync/loop.go @@ -14,6 +14,7 @@ import ( "github.com/fsnotify/fsnotify" "github.com/MarimerLLC/claude-utils/internal/config" + "github.com/MarimerLLC/claude-utils/internal/distill" "github.com/MarimerLLC/claude-utils/internal/gitwt" ) @@ -91,6 +92,7 @@ func (l *Loop) Run(ctx context.Context) error { l.OnFlush(true, nil) } l.refreshManifest(roots) + l.rebuildDistilledIndex() for { select { @@ -124,6 +126,7 @@ func (l *Loop) Run(ctx context.Context) error { log.Println("flush (local):", err) } l.refreshManifest(roots) + l.rebuildDistilledIndex() if l.OnFlush != nil { l.OnFlush(true, err) } @@ -151,6 +154,7 @@ func (l *Loop) Run(ctx context.Context) error { log.Println("flush (pull):", err) } l.refreshManifest(roots) + l.rebuildDistilledIndex() if l.OnFlush != nil { l.OnFlush(false, err) } @@ -361,6 +365,24 @@ func (l *Loop) refreshManifest(roots Roots) { } } +// rebuildDistilledIndex regenerates the local DISTILLED.md catalog index from +// the synced entry files. The index is a derived, git-ignored artifact, so this +// never affects the sync repo; it just keeps the local index fresh after entries +// arrive (from the /distill skill locally or via a pull). Best-effort: a missing +// catalog dir is skipped and errors are logged, never fatal. +func (l *Loop) rebuildDistilledIndex() { + dir := l.Cfg.DistilledPath() + if _, err := os.Stat(dir); err != nil { + return // no catalog on this PC yet + } + if _, err := distill.BuildIndex(distill.Options{ + ProjectsDir: l.Cfg.ClaudeProjectsDir, + DistilledDir: dir, + }); err != nil { + log.Println("rebuild distilled index:", err) + } +} + // shouldIgnoreFile returns true for filenames the daemon should not sync. func shouldIgnoreFile(name string) bool { if name == "" { @@ -369,7 +391,11 @@ func shouldIgnoreFile(name string) bool { if strings.HasPrefix(name, ".") { return true } - if strings.HasSuffix(name, ".tmp") { + // Atomic-write temp files: a trailing ".tmp", or the + // ".tmp.." form left behind by interrupted memory writes + // (the pid-suffixed variant does not end in ".tmp", so the suffix check + // alone misses it). Neither is a real memory; they must never sync. + if strings.HasSuffix(name, ".tmp") || strings.Contains(name, ".tmp.") { return true } return false diff --git a/skills/distill-apply/SKILL.md b/skills/distill-apply/SKILL.md new file mode 100644 index 0000000..a52604e --- /dev/null +++ b/skills/distill-apply/SKILL.md @@ -0,0 +1,60 @@ +--- +name: distill-apply +description: Seed environment-level lessons from the shared distilled catalog + (~/.claudesync/distilled/) into the CURRENT project's Claude Code memory. Run + from within whatever project — new or existing — you want to bring up to speed. + Use when the user wants another project to benefit from lessons already learned + elsewhere. +--- + +# distill-apply — seed distilled lessons into this project + +The companion to `/distill`. Where `/distill` *promotes* transferable lessons +into the shared catalog, this *applies* chosen catalog entries into the current +project's memory so Claude here starts out knowing them. + +Run this **from within the project you want to enrich** — its memory directory is +the one the harness lets you write to without prompting. + +## Paths & environment — go straight there, don't go exploring + +Work only with two known locations: `~/.claudesync/distilled/` and the current +project's memory directory. **Do not probe `$HOME` or `~/.claudesync/` (the +parent) to "orient"** — it triggers needless permission prompts. The entry files +are the source of truth; the `DISTILLED.md` index is just a convenience, and +`claude-memsync` may not be on PATH — don't hunt for the binary. + +## Steps + +1. **Read the catalog.** List the `*.md` entry files directly in + `~/.claudesync/distilled/` (ignore `DISTILLED.md` itself) — that's the + authoritative set, so you don't depend on the index existing. Optionally run + `claude-memsync distill` first to refresh the index, but treat it as + best-effort: if the binary isn't found, skip it and read the entry files + anyway. If `~/.claudesync/distilled/` doesn't exist or is empty, tell the user + there's nothing to apply yet (run `/distill` in a project first) and stop. + +2. **Diff against this project.** Read the current project's memory directory + (`~/.claude/projects//memory/`, the path in the memory system-reminder). + Match by `name`: skip catalog entries already present here. Present the + remaining entries to the user — name + description — and let them pick, or + apply all not-yet-present entries if they ask for everything. + +3. **Seed each chosen entry.** Copy its body into a new file in *this* project's + memory directory, named `.md`. Keep the frontmatter, but you may drop + `originProject` / `originFile` (provenance is optional once seeded) and keep + `scope: environment` so this project's `/distill` won't try to re-promote it. + +4. **Update the index.** Add a one-line pointer to this project's `MEMORY.md` + under an appropriate heading (e.g. a "Shared environment" section): + `- [](<slug>.md) — <hook>`. + +5. **Report** what you seeded and what you skipped as already-present. + +## Notes + +- This only seeds into the **current** project. To enrich a different project, + run the skill from inside that project. +- Entries are environment-level by construction, so they apply broadly — but if + one clearly doesn't fit this project's stack, say so and skip it rather than + seeding noise. diff --git a/skills/distill/SKILL.md b/skills/distill/SKILL.md new file mode 100644 index 0000000..848a2eb --- /dev/null +++ b/skills/distill/SKILL.md @@ -0,0 +1,140 @@ +--- +name: distill +description: Review this project's Claude Code memories, promote environment-level + lessons (shell/OS quirks, CLI gotchas, toolchain, user identity) into the shared + distilled catalog at ~/.claudesync/distilled/, and tag the originals. Run on + demand. Use when the user wants to extract transferable lessons from a project + so other projects can reuse them. +--- + +# distill — promote environment-level memories into the shared catalog + +You are the *classifier of record* for the distilled-memory system. The +`claude-memsync` daemon can only mechanically aggregate and index what you +produce; the judgment lives here. + +## Concept + +Per-project memories live at `~/.claude/projects/<hash>/memory/*.md`. Some +lessons are **transferable** across every project (how the shell behaves, CLI +gotchas, the user's identity and standing preferences); most are **bound to one +repo** (its code, services, deploy, branch rules). This skill finds the +transferable ones, rewrites them to be project-neutral, and writes them into the +shared catalog so `/distill-apply` can seed them into any other project. + +The catalog lives at **`~/.claudesync/distilled/`** (one `<slug>.md` file per +lesson). It is inside the `claude-memsync` work-tree, so entries sync across the +user's workstations automatically. + +## Paths & environment — go straight there, don't go exploring + +Work only with two known locations: the current project's memory directory and +`~/.claudesync/distilled/`. **Do not probe `$HOME`, `~/.claudesync/` (the +parent), or other directories to "orient"** — that triggers needless permission +prompts. In particular: + +- The catalog directory may not exist yet. Don't test-and-search for it — + create it directly with `mkdir -p ~/.claudesync/distilled` before writing, or + just write the file (the write also creates the dir). +- `claude-memsync` may not be on PATH. Don't hunt for the binary. The index + rebuild (step 8) is **best-effort**: if `claude-memsync` isn't found, skip it + with a one-line note — the catalog entries you wrote are what matter, and the + daemon (or a later manual run) regenerates the index. +- Writing under `~/.claudesync/distilled/` should not prompt if `claude-memsync + init` installed the allow-rule. A prompt there just means the rule isn't in + settings yet — proceed if the user approves; don't go looking elsewhere. + +## Steps + +1. **Scope.** Default to the *current* project's memory directory (the path + shown in the memory system-reminder, e.g. + `~/.claude/projects/<hash>/memory/`). Only widen to other projects if the + user explicitly asks — cross-project reads may require permission. + +2. **List candidates.** Read every `*.md` in that directory. Skip `MEMORY.md`, + any `*.tmp.*` files, and any file that already has `scope: environment` in + its frontmatter (already classified — leave it). + +3. **Classify** each remaining memory against this rubric: + + | Verdict | Means | Examples | + |---------|-------|----------| + | **environment** | true in any repo on this machine/account | MINGW64 `kubectl cp` / `< redirect` are broken → use `cat \| kubectl exec -i`; `gh issue assign --self` doesn't exist → use `gh issue edit --add-assignee`; the user's mission/standing preferences | + | **project** | tied to *this* repo | "this repo requires squash merges"; a service's deploy steps; routing/business logic; anything naming this project's components | + + When unsure, choose **project**. A false promotion pollutes every other + project's context; a missed one just stays put until next time. + + **The cross-stack test (use it especially for `type: reference`):** ask + *"would this still help in a project on a completely different tech stack?"* + Shell/OS/CLI gotchas and your standing preferences pass — they're true + regardless of language or framework. A fact about a specific library, + framework, or pinned version (e.g. "in Microsoft.Extensions.AI 10.3.0, cached + tokens moved to property X") **fails** — it's noise in a project that doesn't + use that dependency, so it stays **project**-scoped even though it's + technically "transferable" to other repos using the same library. Distill the + environment, not the tech stack. + +4. **Generalize** each environment memory before promoting it. Strip residual + project specifics — replace a `rockbot`-specific pod selector or path with a + generic placeholder or example, drop framing like "in this repo" — while + preserving the transferable rule and a usable example. The catalog entry must + read as advice for *any* project. + +5. **Write the catalog entry** to `~/.claudesync/distilled/<slug>.md` (run + `mkdir -p ~/.claudesync/distilled` first if you're shelling out, or just write + the file). Pick `<slug>` as a short **kebab-case** identifier for the lesson + (e.g. `mingw64-kubectl-file-transfer`, `container-image-tag-hygiene`). Use + exactly this frontmatter shape (the daemon's parser reads `metadata.*`): + + ```markdown + --- + name: <slug> + description: <one-line summary> + metadata: + type: feedback # or user / reference — carry over from the original + scope: environment + originProject: <hash dir name of the source project> + originFile: <source file name> + --- + + <generalized body> + ``` + + **`name` must equal the file's `<slug>`** — a kebab-case identifier, *not* the + source memory's human-readable `name` (which is often a full sentence). The + filename and the `name` field must match. The source keeps its own name; the + link back to the source is `originProject`/`originFile`, which is also how the + tooling matches a catalog entry to its origin — so the two names are allowed to + differ, but the catalog side must be a clean slug. + +6. **Tag the original** in place: add `scope: environment` under its `metadata:` + block (or as a top-level key if it uses the older flat frontmatter). Leave the + original body unchanged — the project keeps its richer, specific version. This + tag is a breadcrumb: it marks the memory as promoted and lets future `/distill` + runs skip it, and it lets `claude-memsync` prune the catalog entry if the + original is later deleted. + +7. **Resolve conflicts.** If `claude-memsync distill` (next step) reports the + same `name` carried by multiple sources with divergent content, read both and + write one merged, generalized entry. + +8. **Rebuild the index and report (best-effort).** Try: + + ```sh + claude-memsync distill + ``` + + If it runs, it regenerates `~/.claudesync/distilled/DISTILLED.md` and prints + the entry count plus any pending (tagged-but-not-yet-distilled) memories and + conflicts. **If `claude-memsync` isn't on PATH, don't search for it** — note + that the index will be regenerated by the daemon (or a later manual + `claude-memsync distill`) and move on. Either way, summarize for the user what + you promoted, what you left as project-specific and why, and anything pending. + +## Notes + +- Do **not** invent a `scope` marker on the everyday memory-writer's behalf + elsewhere; this skill is where that decision is made and recorded. +- Never merge prose mechanically — that's why this is a skill and not part of the + Go daemon.