Problem
Today, artifact reuse only works when two pipelines share the same build_dir and manifest.json. The cache key is semantic (artifact_id + input hashes + prompt + model config), but the lookup is scoped to a single manifest file. If you run the same transform with the same inputs in a different build directory, it rebuilds from scratch.
Vision
Fully content-addressed build cache. If we can determine that a unit of work is identical — same inputs, same prompt, same model config — we pull the result from cache regardless of which pipeline or build directory produced it.
Think: ccache for LLM transforms. The artifact store becomes a global (or per-user) content-addressed cache keyed on the hash of (input_hashes, prompt_id, model_config, transform_cache_key). Any pipeline that requests the same computation gets a cache hit.
What This Enables
- Cross-project reuse: Two unrelated pipelines processing the same source files share cached transcripts/episodes without needing to point at the same
build_dir
- CI caching: Warm cache from prior runs, even if build dirs differ per job
- Deduplication: Same conversation exported from both ChatGPT backup and a shared folder — parsed once, reused everywhere
- Pipeline experimentation: Try different rollup strategies without re-running expensive lower layers, even from a fresh directory
Rough Design Notes
- Global cache dir (e.g.
~/.cache/synix/ or configurable via SYNIX_CACHE_DIR)
- Cache key:
SHA256(sorted(input_hashes) + prompt_id + canonical(model_config) + transform_cache_key)
- Build-local manifest still exists for fast lookups; global cache is the fallback
synix gc to prune old/unused cache entries
- Opt-in initially (
pipeline.cache_dir = "~/.cache/synix" or CLI flag)
Not Now
This is a future enhancement. Current per-build-dir caching works fine for the primary use case (swapping pipelines in the same project). Tracked here for when cross-project reuse becomes a real need.
Problem
Today, artifact reuse only works when two pipelines share the same
build_dirandmanifest.json. The cache key is semantic (artifact_id+ input hashes + prompt + model config), but the lookup is scoped to a single manifest file. If you run the same transform with the same inputs in a different build directory, it rebuilds from scratch.Vision
Fully content-addressed build cache. If we can determine that a unit of work is identical — same inputs, same prompt, same model config — we pull the result from cache regardless of which pipeline or build directory produced it.
Think:
ccachefor LLM transforms. The artifact store becomes a global (or per-user) content-addressed cache keyed on the hash of(input_hashes, prompt_id, model_config, transform_cache_key). Any pipeline that requests the same computation gets a cache hit.What This Enables
build_dirRough Design Notes
~/.cache/synix/or configurable viaSYNIX_CACHE_DIR)SHA256(sorted(input_hashes) + prompt_id + canonical(model_config) + transform_cache_key)synix gcto prune old/unused cache entriespipeline.cache_dir = "~/.cache/synix"or CLI flag)Not Now
This is a future enhancement. Current per-build-dir caching works fine for the primary use case (swapping pipelines in the same project). Tracked here for when cross-project reuse becomes a real need.