Skip to content

Commit d700aee

Browse files
authored
Merge pull request #125 from ganto/feature/gentoo-proxy
Add Gentoo distfiles proxy support with cache exclude feature
2 parents fbadbcd + c5f5fdb commit d700aee

File tree

23 files changed

+587
-9
lines changed

23 files changed

+587
-9
lines changed

.github/workflows/e2e.yaml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ jobs:
2828
e2e:
2929
if: github.event_name == 'workflow_dispatch'
3030
runs-on: ubuntu-latest
31-
timeout-minutes: 5
31+
timeout-minutes: ${{ matrix.timeout || 5 }}
3232

3333
strategy:
3434
fail-fast: false
@@ -55,6 +55,10 @@ jobs:
5555
- name: Arch Linux latest
5656
test: TestArch
5757
release: latest
58+
- name: Gentoo latest
59+
test: TestGentoo
60+
release: latest
61+
timeout: 10
5862

5963
name: ${{ matrix.name }}
6064

@@ -71,7 +75,7 @@ jobs:
7175
env:
7276
E2E_RELEASE: ${{ matrix.release }}
7377
CONTAINER_RUNTIME: docker
74-
run: go test -tags e2e -v -race -timeout 5m -run ${{ matrix.test }} ./test/e2e/
78+
run: go test -tags e2e -v -race -timeout ${{ matrix.timeout || 5 }}m -run ${{ matrix.test }} ./test/e2e/
7579

7680
report-status:
7781
if: always() && github.event_name == 'workflow_dispatch'

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
66

77
## [Unreleased](https://github.com/ganto/pkgproxy/commits/HEAD/)
88

9+
### Added
10+
11+
- Per-repository `exclude` config field to prevent specific filenames or suffixes from being cached
12+
- Support caching Gentoo distfiles with `suffixes: ["*"]` wildcard and `exclude` list
13+
- Gentoo e2e test using `emerge --fetchonly` in a `gentoo/stage3` container
14+
915
## [v0.1.2](https://github.com/ganto/pkgproxy/releases/tag/v0.1.2) - 2026-03-28
1016

1117
### Fixed

Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,8 @@ $(if $(filter rockylinux,$(1)),TestRockyLinux,\
107107
$(if $(filter debian,$(1)),TestDebian,\
108108
$(if $(filter ubuntu,$(1)),TestUbuntu,\
109109
$(if $(filter archlinux,$(1)),TestArch,\
110-
$(error Unknown DISTRO: $(1). Use one of: fedora centos-stream almalinux rockylinux debian ubuntu archlinux)))))))))
110+
$(if $(filter gentoo,$(1)),TestGentoo,\
111+
$(error Unknown DISTRO: $(1). Use one of: fedora centos-stream almalinux rockylinux debian ubuntu archlinux gentoo))))))))))
111112
endef
112113

113114
.PHONY: e2e

README.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,13 @@ Can be used for any type of RPM-based enterprise distribution. E.g. `/etc/yum.re
135135
baseurl=http://<pkgproxy>:8080/epel/$releasever/Everything/$basearch/
136136
```
137137

138+
### Gentoo
139+
140+
`/etc/portage/make.conf`:
141+
```
142+
GENTOO_MIRRORS="http://<pkgproxy>:8080/gentoo"
143+
```
144+
138145
### Fedora
139146

140147
`/etc/yum.repos.d/fedora.repo` (adjust other repositories accordingly):
@@ -193,7 +200,7 @@ Run tests for a specific distribution and release:
193200
make e2e DISTRO=fedora RELEASE=42
194201
```
195202

196-
Supported `DISTRO` values: `fedora`, `centos-stream`, `almalinux`, `rockylinux`, `debian`, `ubuntu`, `archlinux`.
203+
Supported `DISTRO` values: `fedora`, `centos-stream`, `almalinux`, `rockylinux`, `debian`, `ubuntu`, `archlinux`, `gentoo`.
197204

198205
When adding support for a new Linux distribution, corresponding e2e tests should be added as well.
199206

configs/pkgproxy.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,17 @@ repositories:
5656
mirrors:
5757
- https://mirror.init7.net/fedora/epel/
5858
- https://dl.fedoraproject.org/pub/epel/
59+
gentoo:
60+
suffixes:
61+
- "*"
62+
exclude:
63+
- layout.conf
64+
- timestamp.mirmon
65+
- timestamp.dev-local
66+
mirrors:
67+
- https://mirror.init7.net/gentoo/
68+
- https://pkg.adfinis-on-exoscale.ch/gentoo/
69+
- https://distfiles.gentoo.org/
5970
fedora:
6071
suffixes:
6172
- .drpm
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
schema: spec-driven
2+
created: 2026-04-06
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
## Context
2+
3+
pkgproxy routes requests by stripping the first URL path segment as the repository name, then proxying the remainder to configured upstream mirrors. Cache candidacy is currently decided solely by file suffix (`IsCacheCandidate` in `cache.go`). Gentoo distfiles are content-addressed, permanent blobs with heterogeneous file extensions — the suffix model alone cannot represent "cache everything except a few metadata files".
4+
5+
## Goals / Non-Goals
6+
7+
**Goals:**
8+
- Cache all Gentoo distfiles by default with a minimal exclude list for mirror-specific metadata.
9+
- Introduce an `exclude` field that works independently of `"*"`, so operators can also exclude oversized individual files from any repo (e.g. `verylarge.rpm`).
10+
- No changes to the proxy routing or transport layers — Gentoo fits the existing first-segment routing convention.
11+
12+
**Non-Goals:**
13+
- Computing or validating the BLAKE2B path prefix — pkgproxy is a transparent proxy; path correctness is portage's responsibility.
14+
- Caching `layout.conf` — excluded by default in the Gentoo config entry; no special-case code needed.
15+
- Supporting `mirror://gentoo/` pseudo-URI scheme in ebuilds — handled transparently when portage resolves it to a real URL.
16+
17+
## Decisions
18+
19+
### 1. `"*"` wildcard in `suffixes` means "cache all"
20+
21+
**Decision:** A literal `"*"` entry in the `suffixes` list makes every proxied file a cache candidate, subject to `exclude` filtering.
22+
23+
**Alternatives considered:**
24+
- `cache_all: true` boolean flag — adds a new top-level field and duplicates semantics already expressible via `suffixes`.
25+
- Empty `suffixes` list means cache all — inverts current behavior (empty = cache nothing) and is surprising.
26+
- `suffixes: ["*"]` is explicit, additive, and requires no validator changes.
27+
28+
**Edge case:** If `suffixes` contains both `"*"` and explicit entries (e.g. `["*", ".rpm"]`), the explicit entries are redundant. The config is accepted but `validateConfig` logs a warning naming the repository and the redundant suffixes. `IsCacheCandidate` treats this identically to `["*"]` alone.
29+
30+
### 2. `exclude` matches both exact filenames and suffixes
31+
32+
**Decision:** Each entry in `exclude` is tested against the filename as an exact match first, then as a suffix. This covers:
33+
- Exact files: `layout.conf`, `timestamp.mirmon`, `timestamp.dev-local`
34+
- Suffix-based: `.sig`, `.asc` if an operator wanted to exclude signatures
35+
36+
**Alternatives considered:**
37+
- Separate `exclude_names` and `exclude_suffixes` fields — more explicit but adds config verbosity for a simple feature.
38+
- Glob/regex patterns — more powerful but over-engineered for current needs; can be added later.
39+
40+
### 3. `exclude` is valid without `"*"` in suffixes
41+
42+
**Decision:** The `exclude` field is always applied, regardless of whether `"*"` is present. When no `"*"` is present, it acts as an override on top of suffix matching — useful for excluding a specific large file from an otherwise suffix-matched repo.
43+
44+
**Implementation:** `IsCacheCandidate` runs exclude check before suffix check. If any exclude entry matches, return false immediately.
45+
46+
### 4. Gentoo config uses init7 + Adfinis as primary Swiss mirrors
47+
48+
**Decision:** `mirror.init7.net` first, `pkg.adfinis-on-exoscale.ch` second, `distfiles.gentoo.org` as authoritative fallback.
49+
50+
### 5. E2e test bootstraps portage snapshot and uses emerge --fetchonly
51+
52+
**Decision:** Use `gentoo/stage3:latest`. The test script downloads `portage-latest.tar.xz` directly from `distfiles.gentoo.org` (bypassing the proxy — bootstrap only), unpacks it into `/var/db/repos/gentoo`, sets `GENTOO_MIRRORS` to pkgproxy, then runs `emerge --fetchonly app-text/tree`. This exercises the real portage fetch path including BLAKE2B path resolution.
53+
54+
**Alternatives considered:**
55+
- Raw `wget` of a known distfile URL — simpler and faster, but doesn't validate that portage's mirror resolution works end-to-end through pkgproxy.
56+
57+
The test verifies:
58+
1. `emerge --fetchonly app-text/tree` exits successfully with `GENTOO_MIRRORS` pointing at pkgproxy.
59+
2. The tree source archive is cached on disk under `gentoo/distfiles/`.
60+
3. `wget` of `distfiles/layout.conf` through the proxy succeeds but the file is NOT written to cache.
61+
62+
## Risks / Trade-offs
63+
64+
- **`"*"` caches everything including unexpected content** → Mitigated by the `exclude` list; operators can tune it.
65+
- **Gentoo distfiles are large** → Cache disk usage is unbounded; this is an existing property of pkgproxy (no eviction). No change needed.
66+
- **`portage-latest.tar.xz` snapshot download adds ~300 MB to each e2e test run** → Acceptable; Gentoo e2e tests are run manually on request, not in automated CI.
67+
- **Mirror availability**`distfiles.gentoo.org` as authoritative fallback ensures correctness.
68+
69+
## Open Questions
70+
71+
None — design is fully resolved by this document.
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
## Why
2+
3+
pkgproxy supports caching for RPM, DEB, and Arch-based distros but not Gentoo. Gentoo users who build many packages fetch large source tarballs (distfiles) repeatedly across machines; a local caching proxy reduces bandwidth and improves build times.
4+
5+
## What Changes
6+
7+
- Add `exclude` field to the `Repository` config type: a list of filenames or suffixes that are **never** cached, even when `suffixes` contains `"*"`.
8+
- Add `"*"` wildcard support to the existing `suffixes` field: when present, all proxied files are cache candidates except those matching `exclude` entries.
9+
- Add a `gentoo` repository entry to `configs/pkgproxy.yaml` using Swiss mirrors (init7, Adfinis/Exoscale) with `suffixes: ["*"]` and `exclude` covering mirror-specific metadata files.
10+
- Add a Gentoo e2e test (`TestGentoo`) that fetches a distfile via the proxy from a `gentoo/stage3` container and asserts it is cached.
11+
12+
## Capabilities
13+
14+
### New Capabilities
15+
16+
- `gentoo-distfiles`: Proxy and cache Gentoo distfiles from configurable upstream mirrors, honoring the BLAKE2B hash-based directory layout (`distfiles/<xx>/<filename>`).
17+
- `cache-exclude`: Per-repository `exclude` list that prevents specific filenames or suffixes from being cached, complementing the existing `suffixes` include list and enabling the `"*"` wildcard use case.
18+
19+
### Modified Capabilities
20+
21+
- `e2e-multi-distro`: Gentoo is added as a supported distro with a corresponding e2e test.
22+
23+
## Impact
24+
25+
- `pkg/pkgproxy/repository.go`: Add `Exclude []string` field to `Repository` struct; update `validateConfig` (no required validation, field is optional).
26+
- `pkg/cache/cache.go`: Update `CacheConfig` to carry the exclude list; update `IsCacheCandidate` to handle `"*"` wildcard and exclude matching.
27+
- `configs/pkgproxy.yaml`: Add `gentoo` repository entry.
28+
- `test/e2e/e2e_test.go`: Add `TestGentoo`.
29+
- `README.md` and landing page: Add Gentoo `make.conf` snippet.
30+
- `CHANGELOG.md`: Document new features.
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
## ADDED Requirements
2+
3+
### Requirement: Wildcard suffix caches all files
4+
When the `suffixes` list for a repository contains `"*"`, the cache SHALL treat every proxied file as a cache candidate, subject to the `exclude` list.
5+
6+
#### Scenario: File with uncommon extension is cached under wildcard repo
7+
- **WHEN** a request is made for a file with an extension not in any explicit suffix list (e.g. `.crate`) under a repo with `suffixes: ["*"]`
8+
- **THEN** `IsCacheCandidate` returns true
9+
10+
#### Scenario: Wildcard does not affect repos without it
11+
- **WHEN** a request is made for a file under a repo whose `suffixes` list does not contain `"*"`
12+
- **THEN** `IsCacheCandidate` applies the existing suffix-match logic unchanged
13+
14+
### Requirement: Exclude list prevents specific files from being cached
15+
A repository MAY define an `exclude` list. Each entry is matched against the request filename as an exact name first, then as a suffix. If any entry matches, the file SHALL NOT be cached regardless of `suffixes`.
16+
17+
#### Scenario: Exact filename match prevents caching
18+
- **WHEN** a request is made for a file whose name exactly matches an `exclude` entry (e.g. `layout.conf`)
19+
- **THEN** `IsCacheCandidate` returns false
20+
21+
#### Scenario: Suffix match prevents caching
22+
- **WHEN** a request is made for a file whose name ends with an `exclude` entry (e.g. `.sig`)
23+
- **THEN** `IsCacheCandidate` returns false
24+
25+
#### Scenario: Non-matching file is not excluded
26+
- **WHEN** a request is made for a file that does not match any `exclude` entry
27+
- **THEN** the `exclude` list has no effect on the cache candidacy decision
28+
29+
#### Scenario: Exclude applies without wildcard suffix
30+
- **WHEN** a repository has explicit suffixes (no `"*"`) and an `exclude` list, and a request is made for a file that matches both a suffix and an exclude entry
31+
- **THEN** `IsCacheCandidate` returns false (exclude takes precedence)
32+
33+
### Requirement: Explicit suffixes alongside wildcard are redundant but valid
34+
When the `suffixes` list contains both `"*"` and explicit suffix entries, the configuration SHALL be accepted. pkgproxy SHALL log a warning identifying the repository and the redundant entries. Cache behavior is identical to having only `"*"`.
35+
36+
#### Scenario: Mixed wildcard and explicit suffixes triggers a warning
37+
- **WHEN** pkgproxy loads a repository config whose `suffixes` list contains `"*"` and at least one other entry
38+
- **THEN** the repository is accepted without error, a warning is logged naming the repository and the redundant suffixes, and `IsCacheCandidate` behaves as if only `"*"` were present
39+
40+
### Requirement: Exclude field is optional
41+
The `exclude` field in a repository config SHALL be optional. Repositories without it SHALL behave identically to the current behavior.
42+
43+
#### Scenario: Repository without exclude field
44+
- **WHEN** pkgproxy loads a repository config with no `exclude` key
45+
- **THEN** the repository is accepted without error and cache behavior is unchanged
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
## ADDED Requirements
2+
3+
### Requirement: Gentoo e2e test
4+
The test suite SHALL include a Gentoo test function `TestGentoo` using a `docker.io/gentoo/stage3:latest` container. The test script SHALL:
5+
1. Download the latest portage ebuild snapshot directly from `https://distfiles.gentoo.org/snapshots/portage-latest.tar.xz` (bypassing the proxy — this is bootstrap, not a distfile fetch).
6+
2. Unpack the snapshot into `/var/db/repos/gentoo` inside the container.
7+
3. Configure `GENTOO_MIRRORS` in `/etc/portage/make.conf` to point at the pkgproxy `gentoo` repository.
8+
4. Run `emerge --fetchonly app-text/tree` to fetch the `tree` package sources through the proxy.
9+
5. Fetch `http://<proxy>/gentoo/distfiles/layout.conf` via `wget` to exercise the negative cache path.
10+
11+
#### Scenario: emerge --fetchonly proxies and caches tree distfiles
12+
- **WHEN** the Gentoo container runs `emerge --fetchonly app-text/tree` with `GENTOO_MIRRORS` pointing at pkgproxy
13+
- **THEN** the command exits successfully and the tree source archive exists in the pkgproxy cache under the `gentoo/` subdirectory
14+
15+
#### Scenario: layout.conf is proxied but not cached
16+
- **WHEN** the Gentoo container fetches `http://<proxy>/gentoo/distfiles/layout.conf` via `wget`
17+
- **THEN** the request returns HTTP 200 and `layout.conf` does NOT exist in the pkgproxy cache under the `gentoo/` subdirectory

0 commit comments

Comments
 (0)