Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .github/workflows/e2e.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
e2e:
if: github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
timeout-minutes: 5
timeout-minutes: ${{ matrix.timeout || 5 }}

strategy:
fail-fast: false
Expand All @@ -55,6 +55,10 @@ jobs:
- name: Arch Linux latest
test: TestArch
release: latest
- name: Gentoo latest
test: TestGentoo
release: latest
timeout: 10

name: ${{ matrix.name }}

Expand All @@ -71,7 +75,7 @@ jobs:
env:
E2E_RELEASE: ${{ matrix.release }}
CONTAINER_RUNTIME: docker
run: go test -tags e2e -v -race -timeout 5m -run ${{ matrix.test }} ./test/e2e/
run: go test -tags e2e -v -race -timeout ${{ matrix.timeout || 5 }}m -run ${{ matrix.test }} ./test/e2e/

report-status:
if: always() && github.event_name == 'workflow_dispatch'
Expand Down
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

## [Unreleased](https://github.com/ganto/pkgproxy/commits/HEAD/)

### Added

- Per-repository `exclude` config field to prevent specific filenames or suffixes from being cached
- Support caching Gentoo distfiles with `suffixes: ["*"]` wildcard and `exclude` list
- Gentoo e2e test using `emerge --fetchonly` in a `gentoo/stage3` container

## [v0.1.2](https://github.com/ganto/pkgproxy/releases/tag/v0.1.2) - 2026-03-28

### Fixed
Expand Down
3 changes: 2 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,8 @@ $(if $(filter rockylinux,$(1)),TestRockyLinux,\
$(if $(filter debian,$(1)),TestDebian,\
$(if $(filter ubuntu,$(1)),TestUbuntu,\
$(if $(filter archlinux,$(1)),TestArch,\
$(error Unknown DISTRO: $(1). Use one of: fedora centos-stream almalinux rockylinux debian ubuntu archlinux)))))))))
$(if $(filter gentoo,$(1)),TestGentoo,\
$(error Unknown DISTRO: $(1). Use one of: fedora centos-stream almalinux rockylinux debian ubuntu archlinux gentoo))))))))))
endef

.PHONY: e2e
Expand Down
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,13 @@ Can be used for any type of RPM-based enterprise distribution. E.g. `/etc/yum.re
baseurl=http://<pkgproxy>:8080/epel/$releasever/Everything/$basearch/
```

### Gentoo

`/etc/portage/make.conf`:
```
GENTOO_MIRRORS="http://<pkgproxy>:8080/gentoo"
```

### Fedora

`/etc/yum.repos.d/fedora.repo` (adjust other repositories accordingly):
Expand Down Expand Up @@ -193,7 +200,7 @@ Run tests for a specific distribution and release:
make e2e DISTRO=fedora RELEASE=42
```

Supported `DISTRO` values: `fedora`, `centos-stream`, `almalinux`, `rockylinux`, `debian`, `ubuntu`, `archlinux`.
Supported `DISTRO` values: `fedora`, `centos-stream`, `almalinux`, `rockylinux`, `debian`, `ubuntu`, `archlinux`, `gentoo`.

When adding support for a new Linux distribution, corresponding e2e tests should be added as well.

Expand Down
11 changes: 11 additions & 0 deletions configs/pkgproxy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,17 @@ repositories:
mirrors:
- https://mirror.init7.net/fedora/epel/
- https://dl.fedoraproject.org/pub/epel/
gentoo:
suffixes:
- "*"
exclude:
- layout.conf
- timestamp.mirmon
- timestamp.dev-local
mirrors:
- https://mirror.init7.net/gentoo/
- https://pkg.adfinis-on-exoscale.ch/gentoo/
- https://distfiles.gentoo.org/
fedora:
suffixes:
- .drpm
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-04-06
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
## Context

pkgproxy routes requests by stripping the first URL path segment as the repository name, then proxying the remainder to configured upstream mirrors. Cache candidacy is currently decided solely by file suffix (`IsCacheCandidate` in `cache.go`). Gentoo distfiles are content-addressed, permanent blobs with heterogeneous file extensions — the suffix model alone cannot represent "cache everything except a few metadata files".

## Goals / Non-Goals

**Goals:**
- Cache all Gentoo distfiles by default with a minimal exclude list for mirror-specific metadata.
- Introduce an `exclude` field that works independently of `"*"`, so operators can also exclude oversized individual files from any repo (e.g. `verylarge.rpm`).
- No changes to the proxy routing or transport layers — Gentoo fits the existing first-segment routing convention.

**Non-Goals:**
- Computing or validating the BLAKE2B path prefix — pkgproxy is a transparent proxy; path correctness is portage's responsibility.
- Caching `layout.conf` — excluded by default in the Gentoo config entry; no special-case code needed.
- Supporting `mirror://gentoo/` pseudo-URI scheme in ebuilds — handled transparently when portage resolves it to a real URL.

## Decisions

### 1. `"*"` wildcard in `suffixes` means "cache all"

**Decision:** A literal `"*"` entry in the `suffixes` list makes every proxied file a cache candidate, subject to `exclude` filtering.

**Alternatives considered:**
- `cache_all: true` boolean flag — adds a new top-level field and duplicates semantics already expressible via `suffixes`.
- Empty `suffixes` list means cache all — inverts current behavior (empty = cache nothing) and is surprising.
- `suffixes: ["*"]` is explicit, additive, and requires no validator changes.

**Edge case:** If `suffixes` contains both `"*"` and explicit entries (e.g. `["*", ".rpm"]`), the explicit entries are redundant. The config is accepted but `validateConfig` logs a warning naming the repository and the redundant suffixes. `IsCacheCandidate` treats this identically to `["*"]` alone.

### 2. `exclude` matches both exact filenames and suffixes

**Decision:** Each entry in `exclude` is tested against the filename as an exact match first, then as a suffix. This covers:
- Exact files: `layout.conf`, `timestamp.mirmon`, `timestamp.dev-local`
- Suffix-based: `.sig`, `.asc` if an operator wanted to exclude signatures

**Alternatives considered:**
- Separate `exclude_names` and `exclude_suffixes` fields — more explicit but adds config verbosity for a simple feature.
- Glob/regex patterns — more powerful but over-engineered for current needs; can be added later.

### 3. `exclude` is valid without `"*"` in suffixes

**Decision:** The `exclude` field is always applied, regardless of whether `"*"` is present. When no `"*"` is present, it acts as an override on top of suffix matching — useful for excluding a specific large file from an otherwise suffix-matched repo.

**Implementation:** `IsCacheCandidate` runs exclude check before suffix check. If any exclude entry matches, return false immediately.

### 4. Gentoo config uses init7 + Adfinis as primary Swiss mirrors

**Decision:** `mirror.init7.net` first, `pkg.adfinis-on-exoscale.ch` second, `distfiles.gentoo.org` as authoritative fallback.

### 5. E2e test bootstraps portage snapshot and uses emerge --fetchonly

**Decision:** Use `gentoo/stage3:latest`. The test script downloads `portage-latest.tar.xz` directly from `distfiles.gentoo.org` (bypassing the proxy — bootstrap only), unpacks it into `/var/db/repos/gentoo`, sets `GENTOO_MIRRORS` to pkgproxy, then runs `emerge --fetchonly app-text/tree`. This exercises the real portage fetch path including BLAKE2B path resolution.

**Alternatives considered:**
- Raw `wget` of a known distfile URL — simpler and faster, but doesn't validate that portage's mirror resolution works end-to-end through pkgproxy.

The test verifies:
1. `emerge --fetchonly app-text/tree` exits successfully with `GENTOO_MIRRORS` pointing at pkgproxy.
2. The tree source archive is cached on disk under `gentoo/distfiles/`.
3. `wget` of `distfiles/layout.conf` through the proxy succeeds but the file is NOT written to cache.

## Risks / Trade-offs

- **`"*"` caches everything including unexpected content** → Mitigated by the `exclude` list; operators can tune it.
- **Gentoo distfiles are large** → Cache disk usage is unbounded; this is an existing property of pkgproxy (no eviction). No change needed.
- **`portage-latest.tar.xz` snapshot download adds ~300 MB to each e2e test run** → Acceptable; Gentoo e2e tests are run manually on request, not in automated CI.
- **Mirror availability** → `distfiles.gentoo.org` as authoritative fallback ensures correctness.

## Open Questions

None — design is fully resolved by this document.
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
## Why

pkgproxy supports caching for RPM, DEB, and Arch-based distros but not Gentoo. Gentoo users who build many packages fetch large source tarballs (distfiles) repeatedly across machines; a local caching proxy reduces bandwidth and improves build times.

## What Changes

- Add `exclude` field to the `Repository` config type: a list of filenames or suffixes that are **never** cached, even when `suffixes` contains `"*"`.
- Add `"*"` wildcard support to the existing `suffixes` field: when present, all proxied files are cache candidates except those matching `exclude` entries.
- Add a `gentoo` repository entry to `configs/pkgproxy.yaml` using Swiss mirrors (init7, Adfinis/Exoscale) with `suffixes: ["*"]` and `exclude` covering mirror-specific metadata files.
- Add a Gentoo e2e test (`TestGentoo`) that fetches a distfile via the proxy from a `gentoo/stage3` container and asserts it is cached.

## Capabilities

### New Capabilities

- `gentoo-distfiles`: Proxy and cache Gentoo distfiles from configurable upstream mirrors, honoring the BLAKE2B hash-based directory layout (`distfiles/<xx>/<filename>`).
- `cache-exclude`: Per-repository `exclude` list that prevents specific filenames or suffixes from being cached, complementing the existing `suffixes` include list and enabling the `"*"` wildcard use case.

### Modified Capabilities

- `e2e-multi-distro`: Gentoo is added as a supported distro with a corresponding e2e test.

## Impact

- `pkg/pkgproxy/repository.go`: Add `Exclude []string` field to `Repository` struct; update `validateConfig` (no required validation, field is optional).
- `pkg/cache/cache.go`: Update `CacheConfig` to carry the exclude list; update `IsCacheCandidate` to handle `"*"` wildcard and exclude matching.
- `configs/pkgproxy.yaml`: Add `gentoo` repository entry.
- `test/e2e/e2e_test.go`: Add `TestGentoo`.
- `README.md` and landing page: Add Gentoo `make.conf` snippet.
- `CHANGELOG.md`: Document new features.
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
## ADDED Requirements

### Requirement: Wildcard suffix caches all files
When the `suffixes` list for a repository contains `"*"`, the cache SHALL treat every proxied file as a cache candidate, subject to the `exclude` list.

#### Scenario: File with uncommon extension is cached under wildcard repo
- **WHEN** a request is made for a file with an extension not in any explicit suffix list (e.g. `.crate`) under a repo with `suffixes: ["*"]`
- **THEN** `IsCacheCandidate` returns true

#### Scenario: Wildcard does not affect repos without it
- **WHEN** a request is made for a file under a repo whose `suffixes` list does not contain `"*"`
- **THEN** `IsCacheCandidate` applies the existing suffix-match logic unchanged

### Requirement: Exclude list prevents specific files from being cached
A repository MAY define an `exclude` list. Each entry is matched against the request filename as an exact name first, then as a suffix. If any entry matches, the file SHALL NOT be cached regardless of `suffixes`.

#### Scenario: Exact filename match prevents caching
- **WHEN** a request is made for a file whose name exactly matches an `exclude` entry (e.g. `layout.conf`)
- **THEN** `IsCacheCandidate` returns false

#### Scenario: Suffix match prevents caching
- **WHEN** a request is made for a file whose name ends with an `exclude` entry (e.g. `.sig`)
- **THEN** `IsCacheCandidate` returns false

#### Scenario: Non-matching file is not excluded
- **WHEN** a request is made for a file that does not match any `exclude` entry
- **THEN** the `exclude` list has no effect on the cache candidacy decision

#### Scenario: Exclude applies without wildcard suffix
- **WHEN** a repository has explicit suffixes (no `"*"`) and an `exclude` list, and a request is made for a file that matches both a suffix and an exclude entry
- **THEN** `IsCacheCandidate` returns false (exclude takes precedence)

### Requirement: Explicit suffixes alongside wildcard are redundant but valid
When the `suffixes` list contains both `"*"` and explicit suffix entries, the configuration SHALL be accepted. pkgproxy SHALL log a warning identifying the repository and the redundant entries. Cache behavior is identical to having only `"*"`.

#### Scenario: Mixed wildcard and explicit suffixes triggers a warning
- **WHEN** pkgproxy loads a repository config whose `suffixes` list contains `"*"` and at least one other entry
- **THEN** the repository is accepted without error, a warning is logged naming the repository and the redundant suffixes, and `IsCacheCandidate` behaves as if only `"*"` were present

### Requirement: Exclude field is optional
The `exclude` field in a repository config SHALL be optional. Repositories without it SHALL behave identically to the current behavior.

#### Scenario: Repository without exclude field
- **WHEN** pkgproxy loads a repository config with no `exclude` key
- **THEN** the repository is accepted without error and cache behavior is unchanged
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
## ADDED Requirements

### Requirement: Gentoo e2e test
The test suite SHALL include a Gentoo test function `TestGentoo` using a `docker.io/gentoo/stage3:latest` container. The test script SHALL:
1. Download the latest portage ebuild snapshot directly from `https://distfiles.gentoo.org/snapshots/portage-latest.tar.xz` (bypassing the proxy — this is bootstrap, not a distfile fetch).
2. Unpack the snapshot into `/var/db/repos/gentoo` inside the container.
3. Configure `GENTOO_MIRRORS` in `/etc/portage/make.conf` to point at the pkgproxy `gentoo` repository.
4. Run `emerge --fetchonly app-text/tree` to fetch the `tree` package sources through the proxy.
5. Fetch `http://<proxy>/gentoo/distfiles/layout.conf` via `wget` to exercise the negative cache path.

#### Scenario: emerge --fetchonly proxies and caches tree distfiles
- **WHEN** the Gentoo container runs `emerge --fetchonly app-text/tree` with `GENTOO_MIRRORS` pointing at pkgproxy
- **THEN** the command exits successfully and the tree source archive exists in the pkgproxy cache under the `gentoo/` subdirectory

#### Scenario: layout.conf is proxied but not cached
- **WHEN** the Gentoo container fetches `http://<proxy>/gentoo/distfiles/layout.conf` via `wget`
- **THEN** the request returns HTTP 200 and `layout.conf` does NOT exist in the pkgproxy cache under the `gentoo/` subdirectory
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
## ADDED Requirements

### Requirement: Gentoo distfiles repository config entry
The `configs/pkgproxy.yaml` SHALL include a `gentoo` repository configured with `suffixes: ["*"]`, an `exclude` list covering mirror-specific metadata files (`layout.conf`, `timestamp.mirmon`, `timestamp.dev-local`), and at least two Swiss HTTPS mirrors plus `distfiles.gentoo.org` as authoritative fallback.

#### Scenario: Gentoo distfiles repository is configured
- **WHEN** pkgproxy loads its configuration
- **THEN** the `gentoo` repository is available with at least one upstream mirror

#### Scenario: layout.conf is not cached
- **WHEN** a client fetches `<proxy>/gentoo/distfiles/layout.conf`
- **THEN** pkgproxy proxies the file upstream but does not write it to the local cache

#### Scenario: Distfile fetched via emerge --fetchonly is proxied and cached
- **WHEN** portage runs `emerge --fetchonly app-text/tree` with `GENTOO_MIRRORS` pointing at pkgproxy
- **THEN** pkgproxy proxies the distfile from the upstream mirror and saves it to the local cache under `gentoo/distfiles/<xx>/<filename>`

#### Scenario: Cached distfile is served from disk on subsequent request
- **WHEN** portage fetches the same distfile a second time
- **THEN** pkgproxy serves the file from the local cache without contacting the upstream mirror

### Requirement: make.conf snippet in README and landing page
The README.md and HTTP landing page SHALL include a Gentoo `make.conf` snippet showing how to configure `GENTOO_MIRRORS` to point at the proxy.

#### Scenario: Gentoo configuration snippet is present
- **WHEN** a user views the README or the pkgproxy landing page
- **THEN** a `make.conf` snippet with `GENTOO_MIRRORS="http://<proxy>/gentoo"` is visible
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
## 1. Cache exclude feature

- [x] 1.1 Add `Exclude []string` field to `Repository` struct in `pkg/pkgproxy/repository.go`; in `validateConfig`, if a repository's `suffixes` list contains `"*"` alongside other entries, log a `slog.Warn` naming the repository and the redundant suffixes
- [x] 1.2 Add `Exclude []string` field to `CacheConfig` in `pkg/cache/cache.go`
- [x] 1.3 Pass `Exclude` from `Repository` into `CacheConfig` when constructing upstreams in `proxy.go`
- [x] 1.4 Update `IsCacheCandidate` in `cache.go` to: run exclude check first (exact name + suffix), then handle `"*"` wildcard, then existing suffix logic
- [x] 1.5 Add unit tests for `IsCacheCandidate` covering: wildcard match, exclude exact name, exclude suffix, exclude overrides wildcard, exclude overrides explicit suffix, no exclude field
- [x] 1.6 Add unit test for `validateConfig` covering: wildcard with redundant explicit suffixes emits a warning and returns no error

## 2. Gentoo repository config

- [x] 2.1 Add `gentoo` entry to `configs/pkgproxy.yaml` with `suffixes: ["*"]`, `exclude: [layout.conf, timestamp.mirmon, timestamp.dev-local]`, and mirrors: `mirror.init7.net`, `pkg.adfinis-on-exoscale.ch`, `distfiles.gentoo.org`

## 3. E2e test

- [x] 3.1 Add `assertNotCached` helper to `test/e2e/e2e_test.go` that asserts no file matching a given name exists anywhere under a cache subdirectory
- [x] 3.2 Write `test/e2e/test-gentoo.sh` shell script that: downloads `portage-latest.tar.xz` directly from `distfiles.gentoo.org`, unpacks it to `/var/db/repos/gentoo`, sets `GENTOO_MIRRORS` in `make.conf` to point at pkgproxy, runs `emerge --fetchonly app-text/tree`, then fetches `distfiles/layout.conf` via `wget` through the proxy
- [x] 3.3 Add `TestGentoo` to `test/e2e/e2e_test.go` using `docker.io/gentoo/stage3:latest`, mounting the script, asserting tree source archive is cached under `gentoo/distfiles/`, and asserting `layout.conf` is NOT cached using `assertNotCached`

## 3b. Makefile

- [x] 3b.1 Add `gentoo → TestGentoo` mapping to the `distroToTest` macro in `Makefile` so `make e2e DISTRO=gentoo` works; add `gentoo` to the error message's list of valid values

## 4. Documentation

- [x] 4.1 Add Gentoo `make.conf` snippet to `README.md`
- [x] 4.2 Add Gentoo `make.conf` snippet to the HTTP landing page (`pkg/pkgproxy/landing.go` or template)
- [x] 4.3 Update `CHANGELOG.md` `[Unreleased]` section with new features
Loading
Loading