From 51ee665ab20116d82f6ed03f95f20536e53b6c50 Mon Sep 17 00:00:00 2001 From: Alec Thomas Date: Sat, 21 Mar 2026 09:09:17 +1100 Subject: [PATCH] docs: rewrite README with all strategies and config examples Co-Authored-By: Claude Opus 4.6 (1M context) --- README.md | 299 ++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 233 insertions(+), 66 deletions(-) diff --git a/README.md b/README.md index a9688fb..d9c5a20 100644 --- a/README.md +++ b/README.md @@ -1,117 +1,284 @@ -# Cachew (pronounced cashew) is a super-fast pass-through cache +# Cachew -Cachew is a server and tooling for incredibly efficient, protocol-aware caching. It is -designed to be used at scale, with minimal impact on upstream systems. By "protocol-aware", we mean that the proxy isn't -just a naive HTTP proxy, it is aware of the higher level protocol being proxied (Git, Docker, etc.) and can make more efficient decisions. +Cachew (pronounced "cashew") is a tiered, protocol-aware, caching HTTP proxy for software engineering infrastructure. It understands higher-level protocols (Git, Docker, Go modules, etc.) and makes smarter caching decisions than a naive HTTP proxy. -## Git +## Strategies -Git causes a number of problems for us, but the most obvious are: +### Git -1. Rate limiting by service providers. -2. `git clone` is very slow, even discounting network overhead +Caches Git repositories with two complementary techniques: -To solve this we apply two different strategies on the server: +1. **Snapshots** — periodic `.tar.zst` archives that restore 4–5x faster than `git clone`. +2. **Pack caching** — passthrough caching of packs from `git-upload-pack` for incremental pulls. -1. Periodic full `.tar.zst` snapshots of the repository. These snapshots restore 4-5x faster than `git clone`. -2. Passthrough caching of the packs returned by `POST /repo.git/git-upload-pack` to support incremental pulls. - -On the client we redirect git to the proxy: +Redirect Git traffic through cachew: ```ini -[url "https://cachew.local/github/"] +[url "https://cachew.example.com/git/github.com/"] insteadOf = https://github.com/ ``` -As Git itself isn't aware of the snapshots, Git-specific code in the Cachew CLI can be used to reconstruct a repository. +Restore a repository from a snapshot (with automatic delta bundle to reach HEAD): + +```sh +cachew git restore https://github.com/org/repo ./repo +``` + +```hcl +git { + snapshot-interval = "1h" + repack-interval = "1h" +} +``` + +### GitHub Releases + +Caches public and private GitHub release assets. Private orgs use a token or GitHub App for authentication. + +**URL pattern:** `/github-releases/{owner}/{repo}/{tag}/{asset}` + +```hcl +github-releases { + token = "${GITHUB_TOKEN}" + private-orgs = ["myorg"] +} +``` + +### Go Modules + +Go module proxy (`GOPROXY`-compatible). Private modules are fetched via git clone. + +**URL pattern:** `/gomod/...` + +```sh +export GOPROXY=http://cachew.example.com/gomod,direct +``` + +```hcl +gomod { + proxy = "https://proxy.golang.org" + private-paths = ["github.com/myorg/*"] +} +``` + +### Hermit + +Caches [Hermit](https://cashapp.github.io/hermit/) package downloads. GitHub release URLs are automatically routed through the `github-releases` strategy. + +**URL pattern:** `/hermit/{host}/{path...}` + +```hcl +hermit {} +``` + +### Artifactory + +Caches artifacts from JFrog Artifactory with host-based or path-based routing. + +```hcl +artifactory "example.jfrog.io" { + target = "https://example.jfrog.io" +} +``` + +### Host + +Generic reverse-proxy caching for arbitrary HTTP hosts, with optional custom headers. + +```hcl +host "https://ghcr.io" { + headers = { + "Authorization": "Bearer QQ==" + } +} + +host "https://w3.org" {} +``` + +### HTTP Proxy + +Caching proxy for clients that use absolute-form HTTP requests (e.g. Android `sdkmanager --proxy_host`). + +```hcl +proxy {} +``` + +## Cache Backends + +Multiple backends can be configured simultaneously — they are automatically combined into a tiered cache. Reads check each tier in order and backfill lower tiers on a hit. Writes go to all tiers in parallel. + +### Memory + +In-memory LRU cache. + +```hcl +memory { + limit-mb = 1024 # default + max-ttl = "1h" # default +} +``` + +### Disk + +On-disk LRU cache with TTL-based eviction. + +```hcl +disk { + limit-mb = 250000 + max-ttl = "8h" +} +``` + +### S3 + +S3-compatible object storage (AWS S3, MinIO, etc.). + +```hcl +s3 { + bucket = "my-cache-bucket" + endpoint = "s3.amazonaws.com" + region = "us-east-1" +} +``` ## Authorization (OPA) -Cachew uses [Open Policy Agent](https://www.openpolicyagent.org/) (OPA) for request authorization. A default policy is -always active even without any configuration, allowing any request from 127.0.0.1 and `GET` and `HEAD` requests from -elsewhere. +Cachew uses [Open Policy Agent](https://www.openpolicyagent.org/) for request authorization. The default policy allows all methods from `127.0.0.1` and `GET`/`HEAD` from elsewhere. -To customise the policy, add an `opa` block to your configuration with either an inline policy or a path to a `.rego` file: +Policies must be in `package cachew.authz` and define a `deny` rule set. If the set is empty, the request is allowed; otherwise the reasons are returned to the client. ```hcl -# Inline policy opa { policy = < [-o file] +cachew put [file] [--ttl 1h] +cachew stat +cachew delete +cachew namespaces + +# Directory snapshots +cachew snapshot [--ttl 1h] [--exclude pattern] +cachew restore + +# Git +cachew git restore [--no-bundle] +``` + +**Global flags:** `--url` (`CACHEW_URL`), `--authorization` (`CACHEW_AUTHORIZATION`), `--platform` (prefix keys with `os-arch`), `--daily`/`--hourly` (prefix keys with date). + +## Observability ```hcl -# Inline JSON data -opa { - policy-file = "./policy.rego" - data = <