|
| 1 | +## Context |
| 2 | + |
| 3 | +pkgproxy routes requests by stripping the first URL path segment as the repository name, then proxying the remainder to configured upstream mirrors. Cache candidacy is currently decided solely by file suffix (`IsCacheCandidate` in `cache.go`). Gentoo distfiles are content-addressed, permanent blobs with heterogeneous file extensions — the suffix model alone cannot represent "cache everything except a few metadata files". |
| 4 | + |
| 5 | +## Goals / Non-Goals |
| 6 | + |
| 7 | +**Goals:** |
| 8 | +- Cache all Gentoo distfiles by default with a minimal exclude list for mirror-specific metadata. |
| 9 | +- Introduce an `exclude` field that works independently of `"*"`, so operators can also exclude oversized individual files from any repo (e.g. `verylarge.rpm`). |
| 10 | +- No changes to the proxy routing or transport layers — Gentoo fits the existing first-segment routing convention. |
| 11 | + |
| 12 | +**Non-Goals:** |
| 13 | +- Computing or validating the BLAKE2B path prefix — pkgproxy is a transparent proxy; path correctness is portage's responsibility. |
| 14 | +- Caching `layout.conf` — excluded by default in the Gentoo config entry; no special-case code needed. |
| 15 | +- Supporting `mirror://gentoo/` pseudo-URI scheme in ebuilds — handled transparently when portage resolves it to a real URL. |
| 16 | + |
| 17 | +## Decisions |
| 18 | + |
| 19 | +### 1. `"*"` wildcard in `suffixes` means "cache all" |
| 20 | + |
| 21 | +**Decision:** A literal `"*"` entry in the `suffixes` list makes every proxied file a cache candidate, subject to `exclude` filtering. |
| 22 | + |
| 23 | +**Alternatives considered:** |
| 24 | +- `cache_all: true` boolean flag — adds a new top-level field and duplicates semantics already expressible via `suffixes`. |
| 25 | +- Empty `suffixes` list means cache all — inverts current behavior (empty = cache nothing) and is surprising. |
| 26 | +- `suffixes: ["*"]` is explicit, additive, and requires no validator changes. |
| 27 | + |
| 28 | +**Edge case:** If `suffixes` contains both `"*"` and explicit entries (e.g. `["*", ".rpm"]`), the explicit entries are redundant. The config is accepted but `validateConfig` logs a warning naming the repository and the redundant suffixes. `IsCacheCandidate` treats this identically to `["*"]` alone. |
| 29 | + |
| 30 | +### 2. `exclude` matches both exact filenames and suffixes |
| 31 | + |
| 32 | +**Decision:** Each entry in `exclude` is tested against the filename as an exact match first, then as a suffix. This covers: |
| 33 | +- Exact files: `layout.conf`, `timestamp.mirmon`, `timestamp.dev-local` |
| 34 | +- Suffix-based: `.sig`, `.asc` if an operator wanted to exclude signatures |
| 35 | + |
| 36 | +**Alternatives considered:** |
| 37 | +- Separate `exclude_names` and `exclude_suffixes` fields — more explicit but adds config verbosity for a simple feature. |
| 38 | +- Glob/regex patterns — more powerful but over-engineered for current needs; can be added later. |
| 39 | + |
| 40 | +### 3. `exclude` is valid without `"*"` in suffixes |
| 41 | + |
| 42 | +**Decision:** The `exclude` field is always applied, regardless of whether `"*"` is present. When no `"*"` is present, it acts as an override on top of suffix matching — useful for excluding a specific large file from an otherwise suffix-matched repo. |
| 43 | + |
| 44 | +**Implementation:** `IsCacheCandidate` runs exclude check before suffix check. If any exclude entry matches, return false immediately. |
| 45 | + |
| 46 | +### 4. Gentoo config uses init7 + Adfinis as primary Swiss mirrors |
| 47 | + |
| 48 | +**Decision:** `mirror.init7.net` first, `pkg.adfinis-on-exoscale.ch` second, `distfiles.gentoo.org` as authoritative fallback. |
| 49 | + |
| 50 | +### 5. E2e test bootstraps portage snapshot and uses emerge --fetchonly |
| 51 | + |
| 52 | +**Decision:** Use `gentoo/stage3:latest`. The test script downloads `portage-latest.tar.xz` directly from `distfiles.gentoo.org` (bypassing the proxy — bootstrap only), unpacks it into `/var/db/repos/gentoo`, sets `GENTOO_MIRRORS` to pkgproxy, then runs `emerge --fetchonly app-text/tree`. This exercises the real portage fetch path including BLAKE2B path resolution. |
| 53 | + |
| 54 | +**Alternatives considered:** |
| 55 | +- Raw `wget` of a known distfile URL — simpler and faster, but doesn't validate that portage's mirror resolution works end-to-end through pkgproxy. |
| 56 | + |
| 57 | +The test verifies: |
| 58 | +1. `emerge --fetchonly app-text/tree` exits successfully with `GENTOO_MIRRORS` pointing at pkgproxy. |
| 59 | +2. The tree source archive is cached on disk under `gentoo/distfiles/`. |
| 60 | +3. `wget` of `distfiles/layout.conf` through the proxy succeeds but the file is NOT written to cache. |
| 61 | + |
| 62 | +## Risks / Trade-offs |
| 63 | + |
| 64 | +- **`"*"` caches everything including unexpected content** → Mitigated by the `exclude` list; operators can tune it. |
| 65 | +- **Gentoo distfiles are large** → Cache disk usage is unbounded; this is an existing property of pkgproxy (no eviction). No change needed. |
| 66 | +- **`portage-latest.tar.xz` snapshot download adds ~300 MB to each e2e test run** → Acceptable; Gentoo e2e tests are run manually on request, not in automated CI. |
| 67 | +- **Mirror availability** → `distfiles.gentoo.org` as authoritative fallback ensures correctness. |
| 68 | + |
| 69 | +## Open Questions |
| 70 | + |
| 71 | +None — design is fully resolved by this document. |
0 commit comments