Skip to content

Commit b6567eb

Browse files
alecthomasclaude
andauthored
docs: rewrite README with all strategies and config examples (#210)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent f554aa3 commit b6567eb

1 file changed

Lines changed: 233 additions & 66 deletions

File tree

README.md

Lines changed: 233 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -1,117 +1,284 @@
1-
# Cachew (pronounced cashew) is a super-fast pass-through cache
1+
# Cachew
22

3-
Cachew is a server and tooling for incredibly efficient, protocol-aware caching. It is
4-
designed to be used at scale, with minimal impact on upstream systems. By "protocol-aware", we mean that the proxy isn't
5-
just a naive HTTP proxy, it is aware of the higher level protocol being proxied (Git, Docker, etc.) and can make more efficient decisions.
3+
Cachew (pronounced "cashew") is a tiered, protocol-aware, caching HTTP proxy for software engineering infrastructure. It understands higher-level protocols (Git, Docker, Go modules, etc.) and makes smarter caching decisions than a naive HTTP proxy.
64

7-
## Git
5+
## Strategies
86

9-
Git causes a number of problems for us, but the most obvious are:
7+
### Git
108

11-
1. Rate limiting by service providers.
12-
2. `git clone` is very slow, even discounting network overhead
9+
Caches Git repositories with two complementary techniques:
1310

14-
To solve this we apply two different strategies on the server:
11+
1. **Snapshots** — periodic `.tar.zst` archives that restore 4–5x faster than `git clone`.
12+
2. **Pack caching** — passthrough caching of packs from `git-upload-pack` for incremental pulls.
1513

16-
1. Periodic full `.tar.zst` snapshots of the repository. These snapshots restore 4-5x faster than `git clone`.
17-
2. Passthrough caching of the packs returned by `POST /repo.git/git-upload-pack` to support incremental pulls.
18-
19-
On the client we redirect git to the proxy:
14+
Redirect Git traffic through cachew:
2015

2116
```ini
22-
[url "https://cachew.local/github/"]
17+
[url "https://cachew.example.com/git/github.com/"]
2318
insteadOf = https://github.com/
2419
```
2520

26-
As Git itself isn't aware of the snapshots, Git-specific code in the Cachew CLI can be used to reconstruct a repository.
21+
Restore a repository from a snapshot (with automatic delta bundle to reach HEAD):
22+
23+
```sh
24+
cachew git restore https://github.com/org/repo ./repo
25+
```
26+
27+
```hcl
28+
git {
29+
snapshot-interval = "1h"
30+
repack-interval = "1h"
31+
}
32+
```
33+
34+
### GitHub Releases
35+
36+
Caches public and private GitHub release assets. Private orgs use a token or GitHub App for authentication.
37+
38+
**URL pattern:** `/github-releases/{owner}/{repo}/{tag}/{asset}`
39+
40+
```hcl
41+
github-releases {
42+
token = "${GITHUB_TOKEN}"
43+
private-orgs = ["myorg"]
44+
}
45+
```
46+
47+
### Go Modules
48+
49+
Go module proxy (`GOPROXY`-compatible). Private modules are fetched via git clone.
50+
51+
**URL pattern:** `/gomod/...`
52+
53+
```sh
54+
export GOPROXY=http://cachew.example.com/gomod,direct
55+
```
56+
57+
```hcl
58+
gomod {
59+
proxy = "https://proxy.golang.org"
60+
private-paths = ["github.com/myorg/*"]
61+
}
62+
```
63+
64+
### Hermit
65+
66+
Caches [Hermit](https://cashapp.github.io/hermit/) package downloads. GitHub release URLs are automatically routed through the `github-releases` strategy.
67+
68+
**URL pattern:** `/hermit/{host}/{path...}`
69+
70+
```hcl
71+
hermit {}
72+
```
73+
74+
### Artifactory
75+
76+
Caches artifacts from JFrog Artifactory with host-based or path-based routing.
77+
78+
```hcl
79+
artifactory "example.jfrog.io" {
80+
target = "https://example.jfrog.io"
81+
}
82+
```
83+
84+
### Host
85+
86+
Generic reverse-proxy caching for arbitrary HTTP hosts, with optional custom headers.
87+
88+
```hcl
89+
host "https://ghcr.io" {
90+
headers = {
91+
"Authorization": "Bearer QQ=="
92+
}
93+
}
94+
95+
host "https://w3.org" {}
96+
```
97+
98+
### HTTP Proxy
99+
100+
Caching proxy for clients that use absolute-form HTTP requests (e.g. Android `sdkmanager --proxy_host`).
101+
102+
```hcl
103+
proxy {}
104+
```
105+
106+
## Cache Backends
107+
108+
Multiple backends can be configured simultaneously — they are automatically combined into a tiered cache. Reads check each tier in order and backfill lower tiers on a hit. Writes go to all tiers in parallel.
109+
110+
### Memory
111+
112+
In-memory LRU cache.
113+
114+
```hcl
115+
memory {
116+
limit-mb = 1024 # default
117+
max-ttl = "1h" # default
118+
}
119+
```
120+
121+
### Disk
122+
123+
On-disk LRU cache with TTL-based eviction.
124+
125+
```hcl
126+
disk {
127+
limit-mb = 250000
128+
max-ttl = "8h"
129+
}
130+
```
131+
132+
### S3
133+
134+
S3-compatible object storage (AWS S3, MinIO, etc.).
135+
136+
```hcl
137+
s3 {
138+
bucket = "my-cache-bucket"
139+
endpoint = "s3.amazonaws.com"
140+
region = "us-east-1"
141+
}
142+
```
27143

28144
## Authorization (OPA)
29145

30-
Cachew uses [Open Policy Agent](https://www.openpolicyagent.org/) (OPA) for request authorization. A default policy is
31-
always active even without any configuration, allowing any request from 127.0.0.1 and `GET` and `HEAD` requests from
32-
elsewhere.
146+
Cachew uses [Open Policy Agent](https://www.openpolicyagent.org/) for request authorization. The default policy allows all methods from `127.0.0.1` and `GET`/`HEAD` from elsewhere.
33147

34-
To customise the policy, add an `opa` block to your configuration with either an inline policy or a path to a `.rego` file:
148+
Policies must be in `package cachew.authz` and define a `deny` rule set. If the set is empty, the request is allowed; otherwise the reasons are returned to the client.
35149

36150
```hcl
37-
# Inline policy
38151
opa {
39152
policy = <<EOF
40153
package cachew.authz
41-
default allow := false
42-
allow if input.method == "GET"
43-
allow if input.method == "HEAD"
44-
allow if { input.method == "POST"; input.path[0] == "api" }
154+
deny contains "unauthenticated" if not input.headers["authorization"]
155+
deny contains "writes not allowed" if input.method == "PUT"
45156
EOF
46157
}
158+
```
47159

48-
# Or reference an external file
160+
Or reference an external file with optional data:
161+
162+
```hcl
49163
opa {
50164
policy-file = "./policy.rego"
165+
data-file = "./opa-data.json"
51166
}
52167
```
53168

54-
Policies must be written under `package cachew.authz` and define a `deny` rule that collects human-readable reason strings. If the deny set is empty the request is allowed; otherwise it is rejected and the reasons are included in the response body and server logs. The input document available to policies contains:
169+
**Input fields:** `input.method`, `input.path` (string array), `input.headers`, `input.remote_addr` (includes port — use `startswith` to match by IP).
55170

56-
| Field | Type | Description |
57-
|---|---|---|
58-
| `input.method` | string | HTTP method (GET, POST, etc.) |
59-
| `input.path` | []string | URL path split by `/` (e.g. `["api", "v1", "object"]`) |
60-
| `input.headers` | map[string]string | Request headers (lowercased keys) |
61-
| `input.remote_addr` | string | Client address (ip:port) |
171+
## GitHub App Authentication
62172

63-
Since `remote_addr` includes the port, use `startswith` to match by IP:
173+
For private Git repositories and GitHub release assets, configure a GitHub App:
64174

65-
```rego
66-
deny contains "remote address not allowed" if not startswith(input.remote_addr, "127.0.0.1:")
175+
```hcl
176+
github-app {
177+
app-id = "12345"
178+
private-key-path = "./github-app.pem"
179+
installations = { "myorg": "67890" }
180+
}
67181
```
68182

69-
Example policy that requires authentication and blocks writes:
183+
Installations can also be discovered dynamically via the GitHub API.
70184

71-
```rego
72-
package cachew.authz
73-
deny contains "unauthenticated" if not input.headers["authorization"]
74-
deny contains "writes are not allowed" if input.method == "PUT"
75-
deny contains "deletes are not allowed" if input.method == "DELETE"
185+
## CLI
186+
187+
### Server (`cachewd`)
188+
189+
```sh
190+
cachewd --config cachew.hcl
191+
cachewd --schema # print config schema
76192
```
77193

78-
Policies can reference external data that becomes available as `data.*` in Rego. Provide it inline via `data` or from a file via `data-file`:
194+
### Client (`cachew`)
195+
196+
```sh
197+
# Object operations
198+
cachew get <namespace> <key> [-o file]
199+
cachew put <namespace> <key> [file] [--ttl 1h]
200+
cachew stat <namespace> <key>
201+
cachew delete <namespace> <key>
202+
cachew namespaces
203+
204+
# Directory snapshots
205+
cachew snapshot <namespace> <key> <directory> [--ttl 1h] [--exclude pattern]
206+
cachew restore <namespace> <key> <directory>
207+
208+
# Git
209+
cachew git restore <repo-url> <directory> [--no-bundle]
210+
```
211+
212+
**Global flags:** `--url` (`CACHEW_URL`), `--authorization` (`CACHEW_AUTHORIZATION`), `--platform` (prefix keys with `os-arch`), `--daily`/`--hourly` (prefix keys with date).
213+
214+
## Observability
79215

80216
```hcl
81-
# Inline JSON data
82-
opa {
83-
policy-file = "./policy.rego"
84-
data = <<EOF
85-
{"allowed_cidrs": ["10.0.0.0/8"], "jwks": {"keys": [...]}}
86-
EOF
217+
log {
218+
level = "info" # debug, info, warn, error
87219
}
88220
89-
# Or from a file
90-
opa {
91-
policy-file = "./policy.rego"
92-
data-file = "./opa-data.json"
221+
metrics {
222+
service-name = "cachew"
93223
}
94224
```
95225

96-
```json
97-
{"allowed_cidrs": ["10.0.0.0/8"], "jwks": {"keys": [...]}}
98-
```
226+
Admin endpoints: `/_liveness`, `/_readiness`, `PUT /admin/log/level`, `/admin/pprof/`.
99227

100-
```rego
101-
package cachew.authz
102-
deny contains "address not in allowed CIDR" if not net.cidr_contains(data.allowed_cidrs[_], input.remote_addr)
103-
```
228+
## Full Configuration Example
104229

105-
If `data-file` is not set, `data.*` is empty but policies can still use `http.send` to fetch data at evaluation time.
230+
```hcl
231+
state = "./state"
232+
bind = "0.0.0.0:8080"
233+
url = "http://cachew.example.com:8080/"
106234
107-
## Docker
235+
log {
236+
level = "info"
237+
}
108238
109-
## Hermit
239+
opa {
240+
policy = <<EOF
241+
package cachew.authz
242+
deny contains "not localhost" if not startswith(input.remote_addr, "127.0.0.1:")
243+
EOF
244+
}
110245
111-
Caches Hermit package downloads from all sources (golang.org, npm, GitHub releases, etc.).
246+
metrics {}
112247
113-
**URL pattern:** `/hermit/{host}/{path...}`
248+
github-app {
249+
app-id = "12345"
250+
private-key-path = "./github-app.pem"
251+
}
252+
253+
git-clone {}
254+
255+
git {
256+
snapshot-interval = "1h"
257+
repack-interval = "1h"
258+
}
114259
115-
Example: `GET /hermit/golang.org/dl/go1.21.0.tar.gz`
260+
github-releases {
261+
token = "${GITHUB_TOKEN}"
262+
private-orgs = ["myorg"]
263+
}
264+
265+
gomod {
266+
proxy = "https://proxy.golang.org"
267+
private-paths = ["github.com/myorg/*"]
268+
}
116269
117-
GitHub releases are automatically redirected to the `github-releases` strategy.
270+
hermit {}
271+
272+
host "https://ghcr.io" {
273+
headers = {
274+
"Authorization": "Bearer ${GHCR_TOKEN}"
275+
}
276+
}
277+
278+
disk {
279+
limit-mb = 250000
280+
max-ttl = "8h"
281+
}
282+
283+
proxy {}
284+
```

0 commit comments

Comments
 (0)