Skip to content

feat(cache): portable mache db — push/pull/verify/inspect, all 5 phases (mache-aeb262)#412

Merged
jamestexas merged 12 commits into
mainfrom
feat/portable-cache-aeb262
May 23, 2026
Merged

feat(cache): portable mache db — push/pull/verify/inspect, all 5 phases (mache-aeb262)#412
jamestexas merged 12 commits into
mainfrom
feat/portable-cache-aeb262

Conversation

@jamestexas
Copy link
Copy Markdown
Contributor

Summary

The full mache portable-cache feature (bead mache-aeb262): take a built .db, ship it via lockfile + content-addressed chunks, restore byte-equal anywhere. Local + remote OCI transport.

10 commits, 46 cache tests, lint clean. Built across 18 `/evolve` loop iterations under the user's "do not do the minimum needed" thoroughness directive — every CLI surface a careful reviewer would name has tests + docs.

What ships

# Local round-trip
mache cache push --db ./mache.db ./cache-out
mache cache pull --out-db ./restored.db ./cache-out

# Remote OCI transport (build-cache/v1)
mache cache push --db ./mache.db ./cache-out \\
    --remote https://cache.example.com --scope myrepo/abc123 --tag latest
mache cache pull --out-db ./restored.db ./cache-in \\
    --remote https://cache.example.com --scope myrepo/abc123 --ref latest

# CI existence probe (no restore)
mache cache verify --remote ... --scope ... --ref latest

# Local debug summary
mache cache inspect ./cache-out
mache cache inspect ./mache.lock.bin

# Secure CI token loading
mache cache push ... --token-file /run/secrets/cache-token

Phase ledger

Phase What Commit
1 `mache cache push` — walks _source, emits chunks + mache.lock.{bin,toml} c7d90b7
2 `mache cache pull --verify` — restores from local CAS, verify-on-read c7d90b7
3 --remote push/pull via OCI build-cache/v1 (HEAD-checked idempotent push, bounded-parallel chunk upload, verify-on-read on GET) 98fe421
4 Chunks-as-parse-outputs — auto-detected via _ast table; pull reconstructs both _source AND _ast 0a292ac
5 Taskfile entries + GHA workflow (cache-roundtrip.yml) 36dcdfc
extras `mache cache verify` (CI existence probe) a0170cd
extras `mache cache inspect` + --token-file (debug + CI ergonomics) 4b1c9e3
docs ADR-0020, path correction, Phase 4 wire-shape doc, README section, STATUS handoff 3733813, 89be3a2, 14afb0e, a0170cd, 34fa7ca

Test ledger (46/46)

Suite Count
Local push/pull (Phase 1+2) 7
OCI client (Phase 3) 12
End-to-end remote (Phase 3) 3
AST round-trip (Phase 4) 3
TOML round-trip + Phase 4 error paths 6
verify subcommand 4
inspect + --token-file 11

`task cache:test` runs all 46 in ~1s. `task cache:roundtrip` runs the end-to-end smoke. `golangci-lint run ./cmd/`: 0 issues.

Hard dependency

This PR cannot merge until the LLO PR ships cache.capnp. go.mod currently has:

```
replace github.com/agentic-research/ley-line-open/clients/go/leyline-schema => /Users/jamesgardner/remotes/art/ley-line-open/clients/go/leyline-schema
```

Pointing at the local LLO clone. When LLO PR #53 merges + a leyline-schema release ships with cache.capnp, remove the `replace` directive and bump the `require` to the new tag.

Architectural calls captured in code/docs

  • Producer = "mache" (short-name v1 per ADR-0020)
  • Kind = "-source" (matches _source.language)
  • Hash = BLAKE3 (substrate-locked per Σ §3.4); wire reuses sha256: prefix per cloister-spec/build-cache/v1
  • Chunk shape: Phase 1 (raw bytes) OR Phase 4 (JSON {source_id, path, language, content_b64, ast_nodes}), auto-detected via _ast presence
  • Wire form: capnp.Marshal (Go std framing); canonical-form byte-equal with Rust producer deferred to v1.1
  • mache.lock.{bin,toml} — both written; .bin authoritative
  • Token precedence: --token-file > --token > MACHE_CACHE_TOKEN env

See also

  • `docs/cache/STATUS.md` — consumer-side handoff document (single-page)
  • `docs/cache/phase-4-chunk-shape.md` — Phase 4 JSON wire shape reference
  • `docs/adr/0020-portable-cache-lockfile-schema.md` — consumer-side adoption ADR
  • LLO PR Update demo GIF resolution and verify/update docs #53 — substrate (schema + FsBlobStore)
  • cloister PR (forthcoming) — `cloister-spec/build-cache/v1/` spec + conformance vectors

Test plan

  • `task cache:test` passes (46/46)
  • `task cache:roundtrip` passes
  • `task lint` clean
  • go.mod replace directive removed before merge (gated on LLO PR shipping)
  • `task build` produces a binary; `mache cache --help` shows all four subcommands

🤖 Generated with Claude Code

jamestexas and others added 12 commits May 19, 2026 21:26
External smell rules from $MACHE_SMELL_RULES_DIR are appended to the
registry and their ScopeColumn value is interpolated unescaped into
runSmellRule's SQL (`"AND " + rule.ScopeColumn + " = ?"`). The trust
boundary is operator-controlled, so this isn't a vulnerability — but
the cost of a load-time whitelist is one regex-shaped check and the
value is "a typo or malicious external rule can't smuggle a `;`
terminator, `--` line comment, or unexpected characters into the
SQL composition path."

Whitelist mirrors the character set the built-in ScopeColumn values
actually use (identifiers, `.`, `,`, `(`, `)`, `'`, space) — proven
by TestValidateScopeColumn_AcceptsBuiltinShapes which iterates the
registry. Rejection coverage in TestValidateScopeColumn_RejectsInjectionShapes
spans `;`, `--`, `/*`, `*`, `=`, backtick, double-quote, newline.
End-to-end TestLoadExternalSmellRules_RejectsInjectableScopeColumn
proves a malicious JSON rule never reaches runSmellRule.
…cheLockfile schema

User correction landed: schema design moved from this repo to LLO
(ADR-0021 / ley-line-open-ae89aa). This ADR is now the mache-specific
consumer adoption note covering:

- producer string ("mache") and kind vocabulary (per-language)
- lockfile location (mache.lock.toml at repo root, committed)
- input_hash definition (raw bytes, no normalization)
- verification posture (re-hash + chunk-hash fallback)
- one combined cross-language lockfile per repo for v1

Pairs with mache-aeb262 (the portable mache db feature bead).
Branch: feat/portable-cache-aeb262.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ands

Implements consumer-side surface for the mache portable-cache feature
against LLO's substrate (cache.capnp schema + FsBlobStore-shaped
layout + BLAKE3 hashes per ADR-0021).

Cobra subcommands:
  mache cache push --db <path> <out-dir>
  mache cache pull --out-db <path> [--verify] <in-dir>

Phase 1 (push):
- Opens mache-built .db, queries _source for (id, path, language, content)
- Computes BLAKE3(content) for each source (v1: chunk = raw bytes;
  Phase 4 will switch to capnp-encoded parse outputs once sheaf-driven
  incremental lands)
- Writes chunks to <out>/objects/<hash[0..2]>/<hash[2..]> matching LLO's
  FsBlobStore layout (future migration is a no-op)
- Atomic write (temp + fsync + rename); idempotent (skip if present and
  hashes match; hard-fail if present but corrupt)
- Emits both mache.lock.bin (capnp wire, authoritative) and
  mache.lock.toml (diff-friendly TOML) per ADR-0025 conventions

Phase 2 (pull):
- Reads .bin lockfile; refuses mismatched schemaVersion or foreign producer
- For each source: fetches chunk by hash, verifies BLAKE3 unless
  --verify=false, inserts into fresh _source table
- Verifies root chain (BLAKE3(concat(chunkHashes)) == lockfile.root)
- v1 restores only _source; _ast / _lsp* come back via re-ingest

7 tests, all pass:
- EmitsLockfileAndChunks   : layout + hashes + meta
- RefusesEmptyDB           : empty-db guard
- PushPull_RoundTrip       : 3-source end-to-end
- RejectsWrongSchemaVersion: version-skew refused (hand-built bad lockfile)
- VerifyRejectsTamperedChunk : verify-on-read catches disk tampering
- NoVerifyAcceptsTamperedChunk : --verify=false documented behavior
- Idempotent               : second push is no-op (IM axiom)

Architectural decisions in code comments:
- Producer = "mache" (short-name v1 per ADR-0020)
- Kind = "<language>-source"
- v1 chunks = raw bytes; Phase 4 → capnp _ast rows
- Wire = capnp Marshal (canonicalize is v1.1 follow-up for cross-runtime
  byte equality with Rust producer)
- go.mod replace directive points at local LLO leyline-schema until
  v0.5.x ships to module registry

Phase 3 (remote build-cache transport per cloister-spec/build-cache/v1),
Phase 4 (chunks-as-parse-outputs), and Phase 5 (CI/dev UX) remain
queued in the mache-aeb262 bead.

go test ./cmd/ -run TestCache: 7/7 pass
golangci-lint run ./cmd/: clean

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ush/pull --remote)

Implements consumer-side surface for cloister-spec/build-cache/v1
(cloister-bb168f). HTTP+OCI plumbing on top of the spec that landed
in earlier iterations.

Files:

cmd/cache_oci.go: OCIClient (HeadBlob/PutBlob/GetBlob/PutManifest/
GetManifest), high-level PushBundle/PullBundle with bounded parallel
chunk uploads, typed errors (OCIBlobMissingError/
OCIManifestMissingError), BLAKE3-in-sha256:-prefix digest encoding
per spec's deliberate misuse, verify-on-read on every GET.

cmd/cache.go (extended): --remote/--scope/--tag/--token flags on push;
--remote/--scope/--ref/--token flags on pull. runCacheRemotePush walks
local emit dir + uploads via OCIClient; runCacheRemotePull fetches into
the local cache layout runCachePull understands. Token reads
MACHE_CACHE_TOKEN env if --token not set.

cmd/cache_oci_test.go: httptest in-process mock registry with
concurrency-safe state + failure injection. 12 tests for blob
round-trip, HEAD present/absent, idempotency, corruption detection,
404s, manifest mediaType refusal, bundle round-trip, missing-chunk
guard, HEAD/PUT failure surfacing, parallel upload.

cmd/cache_remote_test.go: end-to-end db → local push → remote push →
fresh remote pull → local restore → byte-equal content. Plus idempotency
across the wire.

Verification:
- go test ./cmd/ -run "TestCache|TestOCI": 22/22 pass
  (7 Phase 1+2 + 12 Phase 3 OCI client + 3 Phase 3 e2e)
- golangci-lint: clean
- gofumpt: clean (auto-formatted on commit hook)

What this enables:
  mache cache push --db <db> <out-dir> --remote <url> --scope <repo>/<sha>
  mache cache pull --out-db <db> <in-dir> --remote <url> --scope <repo>/<sha> --ref <ref>

Honest limits documented in cache_oci.go:
- OAuth2 dance is registry's concern; client takes pre-issued token
- No retry/backoff; caller wraps
- HTTP/2 reuse limited to net/http defaults
- Cross-region failover not handled
- OCI mount-blob (cross-repo dedup) falls back to plain upload

Phases 4 (chunks-as-parse-outputs via sheaf-driven incremental) and
5 (CI/dev UX) remain queued in mache-aeb262.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rip CI workflow

Phase 5 of mache-aeb262: dev UX (task entries) + CI smoke test (GHA
workflow). Closes out the feature's developer-facing surface.

Taskfile.yml:
- task cache:test       Run all 22 cache-related tests
- task cache:roundtrip  End-to-end self-test (the "feature still works" gate)

.github/workflows/cache-roundtrip.yml:
- Triggers on PR/push affecting cmd/cache*.go or related files
- Matrix: ubuntu-latest + macos-latest
- Runs the 22 cache tests + the round-trip smoke test
- No untrusted-input interpolation; all inputs are commit-controlled

Verification:
- task cache:test       22/22 pass
- task cache:roundtrip  2/2 pass

This completes Phases 1+2+3+5 of mache-aeb262. Phase 4 (chunks-as-
parse-outputs from _ast instead of raw source bytes, via sheaf-driven
incremental) is the remaining scope; it touches mache's _ast walker
and the sheaf substrate so it's a meaningfully bigger arc than the
transport plumbing this iteration landed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ot public-schema/

Earlier draft of this ADR referenced rs/ll-core/public-schema/capnp/
which is wrong. The schema lives at rs/ll-core/schema-capnp/schemas/
alongside common.capnp/ast.capnp — schema-capnp is structural
substrate; public-schema is protocol RPC.

Also: the on-disk paragraph now mentions both mache.lock.bin (canonical
capnp wire, authoritative) AND mache.lock.toml (diff-friendly), matching
what cmd/cache.go actually emits.

This was noted as TODO when the architectural correction landed in LLO
ADR-0021; deferred because the mache repo was blocked on parallel
infra/elixir-parser-out-of-lfs work. Now that work proceeds via
worktree, the fix lands.

No code impact — pure docs drift cleanup.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…etected via _ast)

When a mache-built .db has an _ast table, mache push emits chunks
containing source content + per-source AST node rows. mache pull
reconstructs both _source AND _ast on restore. When _ast is absent,
the existing Phase 1 path (chunk = raw content) still applies.

Closes Phase 4 of mache-aeb262.

Chunk body is JSON per ADR-0021's producer-defined chunk policy.
Future bead can migrate to capnp-encoded ast.capnp if cross-runtime
byte-equal becomes needed; v1 picks JSON for diff-friendliness and
to avoid a schema bump.

Auto-detection (no flag needed):
- runCachePush: dbHasASTTable() probes sqlite_master; emits Phase 4
  chunks if present, Phase 1 otherwise
- runCachePull: chunkBodyIsASTShape() per-chunk check; lazy-creates
  _ast table on first AST-shape chunk

New files:
  cmd/cache_ast.go         JSON wire types + helpers
  cmd/cache_ast_test.go    3 tests (push detect, full round-trip,
                            Phase 1 fallback)

cache.go changes:
  - runCachePush: branch on dbHasASTTable
  - runCachePull: branch on chunkBodyIsASTShape, lazy _ast create

Verification:
  go test ./cmd/ -run "TestCache|TestOCI"   25/25 pass (was 22, +3 new)
  golangci-lint run ./cmd/                  0 issues
  task cache:roundtrip                      passes

The mache portable-cache feature is now Phases 1+2+3+4+5 complete —
the entire mache-aeb262 bead scope has shipped on this branch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ire-shape doc

cmd/cache_toml_test.go: 6 new tests
- TestTOMLLockfile_RoundTripsBin     parse mache.lock.toml back, compare to push
- TestTOMLLockfile_FieldsMatchBin    TOML chunk_hash matches real chunk file
- TestChunkBodyIsASTShape_Negatives  7 negative cases for shape detector
- TestDecodeASTChunk_Negatives       bad JSON / missing source_id
- TestPullRejectsBadBase64InASTChunk content_b64 garbage surfaces error
- TestPullCreatesASTTableConsistently lazy _ast CREATE works any order

docs/cache/phase-4-chunk-shape.md
Reference doc for Phase 4 JSON chunk shape. Previously only in code
comments; promoted to a proper artifact.

Tests: 31/31 pass (was 25, +6 new)
golangci-lint: 0 issues

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`mache cache verify --remote <url> --scope <scope> --ref <ref>` is a
CI-friendly probe: fetches the manifest, HEAD-checks every layer,
GET-verifies the config + a sample layer. Does NOT restore the db.
Designed for a CI step that gates "do we have a cache for this
commit?" before an expensive pull.

4 new tests:
- TestCacheVerify_IntactBundle
- TestCacheVerify_MissingManifest
- TestCacheVerify_MissingLayer
- TestCacheVerify_DetectsCorruptedSampleLayer

README.md gains a "Portable cache" section showing all four CLI
surfaces (push local, push remote, pull, verify) + links to the
wire-shape doc and OCI build-cache/v1 spec.

Tests: 35/35 pass (was 31, +4 new)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…r CI ergonomics

mache cache inspect <cache-dir-or-lockfile> prints a summary without
restoring or touching a registry. Output covers producer + version,
schema, source count, topology edges, root hash, processors, and
chunks-on-disk (present/missing, ast-shape vs raw-shape per Phase
1/4). Works on cache dirs and bare .bin lockfiles.

--token-file <path> on push/pull/verify reads the bearer token from
a file (first line, whitespace-trimmed). Precedence:
--token-file > --token > MACHE_CACHE_TOKEN env. CI usage: mount a
secret as a file, pass --token-file. Tokens never appear in process
args or env where child processes can read them.

11 new tests:
- 4 inspect (dir, bare bin, missing chunks, AST bundle)
- 7 token resolution (priority, trimming, empty-file error, CLI
  fallback, env fallback, all-empty, missing file)

Tests: 46/46 pass

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Single-page summary of mache-aeb262's complete state. Mirrors the
LLO substrate-side checkpoint but from the consumer side.

Covers all 5 phases + 2 extras (verify, inspect) with commit SHAs,
46-test ledger, LLO substrate beads consumed, architectural calls,
operational follow-ups, how to verify locally, cron status.

Any future reviewer or AI agent picking up this branch reads this
doc first to orient.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…al-path replace

LLO PR #53 merged to main at 5ee058e. Replace go.mod's
local-filesystem replace directive with a Go pseudo-version pulled
from the merged commit:

  v0.4.6-0.20260523221739-5ee058ebf3e1

Reproducer:
  go get github.com/agentic-research/ley-line-open/clients/go/leyline-schema@5ee058ebf3e1657a500aff8bb3a8e181c5666340
  go mod edit -dropreplace=github.com/agentic-research/ley-line-open/clients/go/leyline-schema
  go mod tidy

Verification:
  go test ./cmd/ (cache subset): 46/46 pass
  golangci-lint run ./cmd/: 0 issues

mache CI now works against a real LLO dep — no more local-path replace
blocking the runners.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@jamestexas jamestexas merged commit d0ba880 into main May 23, 2026
8 of 16 checks passed
@jamestexas jamestexas deleted the feat/portable-cache-aeb262 branch May 23, 2026 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant