From 04b1e8b3551dfa777abaddfa870a176ce7007f64 Mon Sep 17 00:00:00 2001 From: Riccardo Schirone Date: Thu, 14 May 2026 16:31:48 +0200 Subject: [PATCH 01/10] feat(scripts): add docker-only e2e command for example-libpng Adds scripts/e2e.sh, `make e2e`, and a .claude/commands/e2e.md slash command that bring the Buttercup stack up via dev/docker-compose (no Kubernetes), submit the example-libpng task, and monitor the scheduler / seed-gen / patcher logs through the milestones tracked by .github/workflows/system-integration.yml (fuzzer build, POV submit/ pass, seed-gen, patch generate / approve / pass, bundle submit, and optionally SARIF). Defaults LITELLM_MAX_BUDGET to \$3 so accidental runs are cheap; tears the stack down on exit unless --keep-up is set. Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/commands/e2e.md | 89 +++++++++ Makefile | 8 +- scripts/e2e.sh | 424 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 520 insertions(+), 1 deletion(-) create mode 100644 .claude/commands/e2e.md create mode 100755 scripts/e2e.sh diff --git a/.claude/commands/e2e.md b/.claude/commands/e2e.md new file mode 100644 index 00000000..a364c01b --- /dev/null +++ b/.claude/commands/e2e.md @@ -0,0 +1,89 @@ +--- +description: Run a Docker-only end-to-end smoke test of Buttercup against example-libpng with a low LLM budget, and monitor the pipeline. +argument-hint: "[--budget N] [--task-duration SEC] [--keep-up] [--no-build] [--skip-wait] [--sarif]" +allowed-tools: Bash(./scripts/e2e.sh:*), Bash(make e2e*), Bash(docker compose:*), Bash(cd dev/docker-compose && docker compose:*), Read +--- + +# /e2e — Docker-only end-to-end Buttercup run (example-libpng) + +This command exercises the full Buttercup pipeline on the [example-libpng](https://github.com/tob-challenges/example-libpng) challenge **using Docker only — no Kubernetes/minikube**. It uses the `dev/docker-compose/` stack and a low LiteLLM budget (default **$3**), so an accidental run is cheap. + +> **Host requirement:** x86_64. The fuzzer / patcher / seed-gen images build on `gcr.io/oss-fuzz-base/base-runner`, which is amd64-only. On aarch64 the build will fail with `exec format error` unless you install `qemu-user-static` + `binfmt` and set `DOCKER_DEFAULT_PLATFORM=linux/amd64` (and even then everything runs ~10× slower under emulation). + +Mirrors the milestones in `.github/workflows/system-integration.yml`, but tails `docker compose logs` instead of `kubectl logs`. + +## What it does + +1. Checks for `docker`, `docker compose`, `curl`, and at least one LLM provider key (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or `GEMINI_API_KEY`) in your env. +2. Writes `dev/docker-compose/.env` with the provider keys and `LITELLM_MAX_BUDGET=$BUDGET` (default `3`). +3. Builds and starts every service in `dev/docker-compose/compose.yaml` (redis, dind, litellm, task-server, task-downloader, scheduler, program-model, build-bot, fuzzer-bot, coverage-bot, tracer-bot, seed-gen, patcher, buttercup-ui). +4. POSTs the canned libpng `trigger_task` payload to `http://localhost:31323/webhook/trigger_task`. +5. Waits, in order, for these scheduler/seed-gen log markers (timeout configurable per phase): + - `Processing build output for type FUZZER` — fuzzer build done + - `POV submission response: pov_id=` — vulnerability found and POV submitted + - `Updated POV status. New status PASSED` — POV accepted by competition API + - `Copied N files to corpus` — seed-gen produced seeds + - `Appending patch for task` — patch generated + - approves the patch via `POST /v1/task//patch//approve` + - `Patch passed` — patch accepted + - `Bundle submission response: bundle_id=` — bundle submitted +6. With `--sarif`, also sends a SARIF broadcast and waits for `Matching SARIF submission response`. +7. Prints a colored summary and tears the stack down with `docker compose down -v` (unless `--keep-up`). + +## Run it + +The driver is `scripts/e2e.sh`. The `Makefile` exposes `make e2e`. + +```bash +# Plain run with the $3 budget default +make e2e + +# Pass flags through the Makefile +make e2e E2E_ARGS="--budget 5 --keep-up" + +# Or call the script directly +./scripts/e2e.sh --budget 3 --task-duration 1800 +./scripts/e2e.sh --skip-wait --keep-up # just bring the stack up + submit task +./scripts/e2e.sh --sarif # also exercise the SARIF flow +``` + +The script writes/overwrites `dev/docker-compose/.env` on each run. + +## Monitoring while it's running + +The script already streams milestone progress to its own stdout. For finer-grained visibility while it runs: + +```bash +# All services, follow +cd dev/docker-compose && docker compose logs -f + +# Just the scheduler (most milestones live here) +cd dev/docker-compose && docker compose logs -f scheduler + +# Patcher, seed-gen, fuzzer-bot, program-model +cd dev/docker-compose && docker compose logs -f patcher seed-gen fuzzer-bot program-model + +# LiteLLM spend tracking +cd dev/docker-compose && docker compose logs -f litellm | grep -i 'spend\|budget' +``` + +The web UI is at `http://localhost:31323` (no port-forward needed — it's published on the host). + +## Tearing down + +```bash +cd dev/docker-compose && docker compose down -v --remove-orphans +``` + +`scripts/e2e.sh` does this automatically on exit unless you pass `--keep-up`. + +## When you invoke /e2e + +When the user runs `/e2e`, default behavior: + +1. Run `./scripts/e2e.sh $ARGUMENTS` (forwarding any flags the user passed). +2. While it runs, surface key transitions to the user. The script's own output already prints `[e2e] Reached: …` for each milestone — relay those as they arrive. +3. If the run fails on a milestone, fetch the last ~50 lines of the relevant service: + - `cd dev/docker-compose && docker compose logs --tail=50 ` +4. If the user asks to keep digging, expand the watch with `docker compose logs -f ` until the user is satisfied. +5. On success, summarize the milestones reached and remind the user the stack is already torn down (or still up, if `--keep-up`). diff --git a/Makefile b/Makefile index fbbd49e6..ca083f9c 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ # Makefile for Trail of Bits AIxCC Finals CRS -.PHONY: help setup-local setup-azure validate deploy test undeploy install-cscope lint lint-component clean-local wait-crs check-crs crs-instance-id status send-integration-task +.PHONY: help setup-local setup-azure validate deploy test undeploy install-cscope lint lint-component clean-local wait-crs check-crs crs-instance-id status send-integration-task e2e # Default target help: @@ -23,6 +23,7 @@ help: @echo "Testing:" @echo " send-integration-task - Run integration-test task" @echo " send-libpng-task - Run libpng task" + @echo " e2e - Docker-only end-to-end smoke test against example-libpng (low LLM budget)" @echo "" @echo "Development:" @echo " install-cscope - Install cscope tool" @@ -150,6 +151,11 @@ send-libpng-task: ./orchestrator/scripts/task_crs.sh; \ kill $$PORT_FORWARD_PID 2>/dev/null || true +# Docker-only end-to-end run against example-libpng. No Kubernetes required. +# Pass extra flags via E2E_ARGS, e.g.: make e2e E2E_ARGS="--keep-up --budget 5" +e2e: + @./scripts/e2e.sh $(E2E_ARGS) + # Development targets lint: @echo "Linting all Python code..." diff --git a/scripts/e2e.sh b/scripts/e2e.sh new file mode 100755 index 00000000..2f8cce99 --- /dev/null +++ b/scripts/e2e.sh @@ -0,0 +1,424 @@ +#!/usr/bin/env bash +# scripts/e2e.sh — Run the full Buttercup pipeline against example-libpng using +# the dev docker-compose stack (no Kubernetes required). +# +# This mirrors the milestones checked by .github/workflows/system-integration.yml +# but reads docker-compose logs instead of `kubectl logs`. + +set -u +set -o pipefail + +############################################################################### +# Config & defaults +############################################################################### + +# Resolve repo root from this script's location. +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)" +COMPOSE_DIR="${REPO_ROOT}/dev/docker-compose" +ENV_FILE="${COMPOSE_DIR}/.env" + +# Defaults — overridable via flags or environment. +BUDGET="${LITELLM_MAX_BUDGET:-3}" +TASK_DURATION="${E2E_TASK_DURATION:-1800}" +BUILD_TIMEOUT="${E2E_BUILD_TIMEOUT:-1800}" # seconds (fuzzer build) +VULN_TIMEOUT="${E2E_VULN_TIMEOUT:-1800}" +PATCH_TIMEOUT="${E2E_PATCH_TIMEOUT:-1800}" +BUNDLE_TIMEOUT="${E2E_BUNDLE_TIMEOUT:-300}" +SEED_GEN_TIMEOUT="${E2E_SEED_GEN_TIMEOUT:-1800}" + +DO_BUILD=1 +DO_TEARDOWN=1 +SKIP_WAIT=0 +TASK_JSON="" # if set, used instead of the canned libpng payload +SARIF_RUN=0 + +############################################################################### +# Logging +############################################################################### + +if [[ -t 1 ]]; then + C_RST=$'\033[0m'; C_RED=$'\033[1;31m'; C_GRN=$'\033[1;32m' + C_YLW=$'\033[1;33m'; C_BLU=$'\033[1;36m'; C_DIM=$'\033[2m' +else + C_RST=""; C_RED=""; C_GRN=""; C_YLW=""; C_BLU=""; C_DIM="" +fi + +log() { printf '%s[e2e]%s %s\n' "$C_BLU" "$C_RST" "$*"; } +ok() { printf '%s[e2e]%s %s\n' "$C_GRN" "$C_RST" "$*"; } +warn() { printf '%s[e2e]%s %s\n' "$C_YLW" "$C_RST" "$*" >&2; } +err() { printf '%s[e2e]%s %s\n' "$C_RED" "$C_RST" "$*" >&2; } +dim() { printf '%s[e2e]%s %s%s%s\n' "$C_BLU" "$C_RST" "$C_DIM" "$*" "$C_RST"; } + +############################################################################### +# Usage +############################################################################### + +usage() { + cat </dev/null 2>&1; then + err "docker is required but not installed." + exit 1 +fi +if ! docker compose version >/dev/null 2>&1; then + err "'docker compose' v2 is required (not 'docker-compose')." + exit 1 +fi +if ! command -v curl >/dev/null 2>&1; then + err "curl is required but not installed." + exit 1 +fi + +provider_keys_set=0 +for v in ANTHROPIC_API_KEY OPENAI_API_KEY GEMINI_API_KEY; do + val="${!v:-}" + if [[ -n "$val" && "$val" != "" ]]; then + provider_keys_set=1 + fi +done +if [[ "$provider_keys_set" -eq 0 ]]; then + err "No LLM provider key found in env. Set ANTHROPIC_API_KEY, OPENAI_API_KEY, or GEMINI_API_KEY." + err "Tip: 'export ANTHROPIC_API_KEY=...; scripts/e2e.sh' or add to ${ENV_FILE} first." + exit 1 +fi + +# If keys are missing, leave them at the placeholder so litellm still loads the +# config (some models will fail at request time, others will succeed). +: "${ANTHROPIC_API_KEY:=}" +: "${OPENAI_API_KEY:=}" +: "${GEMINI_API_KEY:=}" +: "${AZURE_API_BASE:=}" +: "${AZURE_API_KEY:=}" +: "${BUTTERCUP_LITELLM_KEY:=sk-1234}" +: "${LANGFUSE_HOST:=}" +: "${LANGFUSE_PUBLIC_KEY:=}" +: "${LANGFUSE_SECRET_KEY:=}" + +############################################################################### +# .env generation +############################################################################### + +log "Writing ${ENV_FILE} (LITELLM_MAX_BUDGET=\$${BUDGET})" +{ + echo "# Generated by scripts/e2e.sh on $(date -Is)" + echo "BUTTERCUP_LITELLM_KEY=${BUTTERCUP_LITELLM_KEY}" + echo "ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}" + echo "OPENAI_API_KEY=${OPENAI_API_KEY}" + echo "GEMINI_API_KEY=${GEMINI_API_KEY}" + echo "AZURE_API_BASE=${AZURE_API_BASE}" + echo "AZURE_API_KEY=${AZURE_API_KEY}" + echo "LITELLM_MAX_BUDGET=${BUDGET}" + echo "LANGFUSE_HOST=${LANGFUSE_HOST}" + echo "LANGFUSE_PUBLIC_KEY=${LANGFUSE_PUBLIC_KEY}" + echo "LANGFUSE_SECRET_KEY=${LANGFUSE_SECRET_KEY}" +} > "$ENV_FILE" + +############################################################################### +# docker compose helpers +############################################################################### + +# Always run compose from the compose dir so relative includes resolve. +dc() { + (cd "$COMPOSE_DIR" && docker compose "$@") +} + +teardown() { + if [[ "$DO_TEARDOWN" -eq 1 ]]; then + log "Tearing the stack down (docker compose down -v)" + dc down -v --remove-orphans || true + else + warn "Leaving the stack up (--keep-up). Tear down with: cd ${COMPOSE_DIR} && docker compose down -v" + fi +} + +on_exit() { + rc=$? + teardown + if [[ $rc -ne 0 ]]; then + err "e2e run finished with exit code $rc" + fi + exit $rc +} +trap on_exit EXIT INT TERM + +############################################################################### +# Bring the stack up +############################################################################### + +if [[ "$DO_BUILD" -eq 1 ]]; then + log "Building docker compose images (this can take a while the first time)" + if ! dc build; then + err "docker compose build failed. On non-x86_64 hosts this usually means an" + err "image (e.g. fuzzer/Dockerfile -> gcr.io/oss-fuzz-base/base-runner) is amd64-only." + err "Inspect the build output above; retry on an x86_64 host, or install" + err "qemu-user-static + binfmt and re-run with DOCKER_DEFAULT_PLATFORM=linux/amd64." + exit 1 + fi +fi + +log "Starting services" +if ! dc up -d; then + err "docker compose up failed. Check 'docker compose ps' / logs." + exit 1 +fi + +# Wait for the buttercup-ui task webhook to be reachable. +log "Waiting for buttercup-ui to accept connections on http://localhost:31323" +ui_up=0 +for _ in $(seq 1 120); do + if curl -sf -o /dev/null -m 2 http://localhost:31323/v1/ping/ 2>/dev/null \ + || curl -sf -o /dev/null -m 2 http://localhost:31323/ 2>/dev/null; then + ui_up=1; break + fi + sleep 2 +done +if [[ "$ui_up" -ne 1 ]]; then + err "buttercup-ui did not come up on port 31323. Check 'docker compose logs buttercup-ui'." + exit 1 +fi +ok "buttercup-ui is up." + +############################################################################### +# Submit the task +############################################################################### + +if [[ -z "$TASK_JSON" ]]; then + TASK_JSON=$(cat <` until a line matching PATTERN appears +# or TIMEOUT_SEC elapses. Returns 0 on success, non-zero on timeout. +wait_for() { + local service="$1" pattern="$2" timeout="$3" label="$4" + local deadline=$(( $(date +%s) + timeout )) + log "Waiting for milestone: ${label} ${C_DIM}(service=${service}, timeout=${timeout}s)${C_RST}" + + while [[ $(date +%s) -lt $deadline ]]; do + # --no-color so the grep matches plain text; --tail=all replays history. + if dc logs --no-color --no-log-prefix --tail=all "$service" 2>/dev/null \ + | grep -m1 -E "$pattern" >/dev/null; then + ok "Reached: ${label}" + return 0 + fi + sleep 15 + done + + err "Timed out after ${timeout}s waiting for: ${label}" + err "Recent logs from ${service}:" + dc logs --no-color --tail=50 "$service" >&2 || true + return 1 +} + +# Capture a single matching log line (returns it on stdout, empty on miss). +capture_line() { + local service="$1" pattern="$2" + dc logs --no-color --no-log-prefix --tail=all "$service" 2>/dev/null \ + | grep -E "$pattern" | head -n1 || true +} + +############################################################################### +# Walk through the pipeline +############################################################################### + +declare -a SUMMARY=() +record() { SUMMARY+=("$1"); } + +wait_for scheduler \ + "Processing build output for type FUZZER" \ + "$BUILD_TIMEOUT" "fuzzer build processed" \ + && record "fuzzer-build: ok" || record "fuzzer-build: TIMEOUT" + +wait_for scheduler \ + "POV submission response: pov_id=" \ + "$VULN_TIMEOUT" "vulnerability (POV) submitted" \ + && record "pov-submit: ok" || record "pov-submit: TIMEOUT" + +wait_for scheduler \ + "Updated POV status. New status PASSED" \ + "$VULN_TIMEOUT" "POV accepted by competition API" \ + && record "pov-passed: ok" || record "pov-passed: TIMEOUT" + +wait_for seed-gen \ + "Copied [1-9][0-9]* files to corpus" \ + "$SEED_GEN_TIMEOUT" "seed-gen produced seeds" \ + && record "seed-gen: ok" || record "seed-gen: TIMEOUT" + +wait_for scheduler \ + "Appending patch for task" \ + "$PATCH_TIMEOUT" "patch generated" \ + && record "patch-generated: ok" || record "patch-generated: TIMEOUT" + +# Approve the patch (the local UI requires explicit approval, unlike scored +# rounds where it is automatic). +PATCH_LINE="$(capture_line scheduler 'competition_patch_id=')" +if [[ -n "$PATCH_LINE" ]]; then + PATCH_ID=$(printf '%s' "$PATCH_LINE" | sed -n 's/.*competition_patch_id=\([^ ]*\).*/\1/p') + # Task id is inside the first [...] block, after the last ':'. + TASK_ID=$(printf '%s' "$PATCH_LINE" | sed -n 's/.*\[\([^]]*\)\].*/\1/p' | sed 's/^[^:]*://') + if [[ -n "$PATCH_ID" && -n "$TASK_ID" ]]; then + log "Approving patch ${C_DIM}task=${TASK_ID} patch=${PATCH_ID}${C_RST}" + curl -fsS -X POST \ + "http://127.0.0.1:31323/v1/task/${TASK_ID}/patch/${PATCH_ID}/approve" \ + >/dev/null && record "patch-approve: ok" || record "patch-approve: HTTP fail" + else + warn "Could not extract patch/task ids from: $PATCH_LINE" + record "patch-approve: skipped (parse fail)" + fi +else + warn "No competition_patch_id= line seen; skipping approval" + record "patch-approve: skipped (no patch line)" +fi + +wait_for scheduler \ + "Patch passed" \ + "$PATCH_TIMEOUT" "patch accepted by competition API" \ + && record "patch-passed: ok" || record "patch-passed: TIMEOUT" + +wait_for scheduler \ + "Bundle submission response: bundle_id=" \ + "$BUNDLE_TIMEOUT" "bundle submitted" \ + && record "bundle-submit: ok" || record "bundle-submit: TIMEOUT" + +if [[ "$SARIF_RUN" -eq 1 ]]; then + SARIF_TASK_ID="${TASK_ID:-}" + if [[ -z "$SARIF_TASK_ID" ]]; then + SARIF_TASK_ID=$(dc logs --no-color --no-log-prefix --tail=all scheduler \ + | grep "Submitting bundle for harness" | head -n1 \ + | grep -o "\[[^]]*\]" | head -n1 \ + | tr -d '[]' | awk -F: '{print $NF}') + fi + if [[ -n "$SARIF_TASK_ID" ]]; then + log "Sending SARIF broadcast for task ${SARIF_TASK_ID}" + if "${REPO_ROOT}/orchestrator/scripts/send_sarif.sh" "$SARIF_TASK_ID" >/dev/null 2>&1; then + record "sarif-send: ok" + else + record "sarif-send: HTTP fail" + fi + wait_for scheduler \ + "Matching SARIF submission response" \ + "$BUNDLE_TIMEOUT" "SARIF accepted" \ + && record "sarif-passed: ok" || record "sarif-passed: TIMEOUT" + else + record "sarif: skipped (no task id)" + fi +fi + +############################################################################### +# Summary +############################################################################### + +printf '\n%s===================== e2e summary =====================%s\n' "$C_BLU" "$C_RST" +for line in "${SUMMARY[@]}"; do + if [[ "$line" == *": ok" ]]; then + printf ' %s✓%s %s\n' "$C_GRN" "$C_RST" "$line" + elif [[ "$line" == *": TIMEOUT" || "$line" == *"fail"* ]]; then + printf ' %s✗%s %s\n' "$C_RED" "$C_RST" "$line" + else + printf ' %s•%s %s\n' "$C_YLW" "$C_RST" "$line" + fi +done +printf '%s=======================================================%s\n' "$C_BLU" "$C_RST" + +# Exit non-zero if any milestone failed. +for line in "${SUMMARY[@]}"; do + if [[ "$line" == *": TIMEOUT" || "$line" == *"fail"* ]]; then + exit 1 + fi +done From f1ae7073e84708c2f315e382d8140670ea56b4ad Mon Sep 17 00:00:00 2001 From: Riccardo Schirone Date: Fri, 15 May 2026 09:40:06 +0000 Subject: [PATCH 02/10] feat(scripts): run e2e via prebuilt GHCR images instead of local build The e2e driver now brings the stack up through the compose.prebuilt.yaml overlay and `docker compose pull` (tag configurable via --image-tag / BUTTERCUP_IMAGE_TAG, default "main") instead of `docker compose build`, so a run no longer depends on a working local image build (e.g. the cscope submodule / oss-fuzz base-runner build chain). - dc() applies `-f compose.yaml -f compose.prebuilt.yaml` and exports BUTTERCUP_IMAGE_TAG for every compose subcommand (pull/up/logs/down). - --no-build kept as a deprecated alias for the new --no-pull. - Teardown hint and e2e.md updated for the overlay. Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/commands/e2e.md | 10 ++++++---- scripts/e2e.sh | 40 ++++++++++++++++++++++++++++------------ 2 files changed, 34 insertions(+), 16 deletions(-) diff --git a/.claude/commands/e2e.md b/.claude/commands/e2e.md index a364c01b..1e94f492 100644 --- a/.claude/commands/e2e.md +++ b/.claude/commands/e2e.md @@ -1,14 +1,16 @@ --- description: Run a Docker-only end-to-end smoke test of Buttercup against example-libpng with a low LLM budget, and monitor the pipeline. -argument-hint: "[--budget N] [--task-duration SEC] [--keep-up] [--no-build] [--skip-wait] [--sarif]" +argument-hint: "[--budget N] [--task-duration SEC] [--image-tag TAG] [--keep-up] [--no-pull] [--skip-wait] [--sarif]" allowed-tools: Bash(./scripts/e2e.sh:*), Bash(make e2e*), Bash(docker compose:*), Bash(cd dev/docker-compose && docker compose:*), Read --- # /e2e — Docker-only end-to-end Buttercup run (example-libpng) -This command exercises the full Buttercup pipeline on the [example-libpng](https://github.com/tob-challenges/example-libpng) challenge **using Docker only — no Kubernetes/minikube**. It uses the `dev/docker-compose/` stack and a low LiteLLM budget (default **$3**), so an accidental run is cheap. +This command exercises the full Buttercup pipeline on the [example-libpng](https://github.com/tob-challenges/example-libpng) challenge **using Docker only — no Kubernetes/minikube**. It uses the `dev/docker-compose/` stack with the **`compose.prebuilt.yaml` overlay** — every component runs from its prebuilt GHCR image (`ghcr.io/trailofbits/buttercup/*`, tag `main` by default), so **nothing is built locally**. A low LiteLLM budget (default **$3**) keeps an accidental run cheap. -> **Host requirement:** x86_64. The fuzzer / patcher / seed-gen images build on `gcr.io/oss-fuzz-base/base-runner`, which is amd64-only. On aarch64 the build will fail with `exec format error` unless you install `qemu-user-static` + `binfmt` and set `DOCKER_DEFAULT_PLATFORM=linux/amd64` (and even then everything runs ~10× slower under emulation). +> **Image tag:** defaults to `main`. Override with `--image-tag ` or `BUTTERCUP_IMAGE_TAG=...` to test a specific build. Private images require `docker login ghcr.io` first. +> +> **Host requirement:** x86_64. The prebuilt fuzzer / patcher / seed-gen images are based on `gcr.io/oss-fuzz-base/base-runner`, which is amd64-only. On aarch64 they only run under `qemu-user-static` + `binfmt` with `DOCKER_DEFAULT_PLATFORM=linux/amd64` (and ~10× slower). Mirrors the milestones in `.github/workflows/system-integration.yml`, but tails `docker compose logs` instead of `kubectl logs`. @@ -16,7 +18,7 @@ Mirrors the milestones in `.github/workflows/system-integration.yml`, but tails 1. Checks for `docker`, `docker compose`, `curl`, and at least one LLM provider key (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or `GEMINI_API_KEY`) in your env. 2. Writes `dev/docker-compose/.env` with the provider keys and `LITELLM_MAX_BUDGET=$BUDGET` (default `3`). -3. Builds and starts every service in `dev/docker-compose/compose.yaml` (redis, dind, litellm, task-server, task-downloader, scheduler, program-model, build-bot, fuzzer-bot, coverage-bot, tracer-bot, seed-gen, patcher, buttercup-ui). +3. Pulls the prebuilt component images (`docker compose -f compose.yaml -f compose.prebuilt.yaml pull`, skippable with `--no-pull`) and starts every service (redis, dind, litellm, task-server, task-downloader, scheduler, program-model, build-bot, fuzzer-bot, coverage-bot, tracer-bot, seed-gen, patcher, buttercup-ui). No local image build. 4. POSTs the canned libpng `trigger_task` payload to `http://localhost:31323/webhook/trigger_task`. 5. Waits, in order, for these scheduler/seed-gen log markers (timeout configurable per phase): - `Processing build output for type FUZZER` — fuzzer build done diff --git a/scripts/e2e.sh b/scripts/e2e.sh index 2f8cce99..7fa0ed0e 100755 --- a/scripts/e2e.sh +++ b/scripts/e2e.sh @@ -2,6 +2,10 @@ # scripts/e2e.sh — Run the full Buttercup pipeline against example-libpng using # the dev docker-compose stack (no Kubernetes required). # +# Uses the prebuilt component images published to GHCR (via the +# compose.prebuilt.yaml overlay) instead of building them locally, so a run +# does not depend on a working local image build. +# # This mirrors the milestones checked by .github/workflows/system-integration.yml # but reads docker-compose logs instead of `kubectl logs`. @@ -27,7 +31,10 @@ PATCH_TIMEOUT="${E2E_PATCH_TIMEOUT:-1800}" BUNDLE_TIMEOUT="${E2E_BUNDLE_TIMEOUT:-300}" SEED_GEN_TIMEOUT="${E2E_SEED_GEN_TIMEOUT:-1800}" -DO_BUILD=1 +# Prebuilt GHCR images instead of local builds (compose.prebuilt.yaml overlay). +IMAGE_TAG="${BUTTERCUP_IMAGE_TAG:-main}" + +DO_PULL=1 DO_TEARDOWN=1 SKIP_WAIT=0 TASK_JSON="" # if set, used instead of the canned libpng payload @@ -66,7 +73,9 @@ Options: --budget DOLLARS LiteLLM per-user max budget (default: $BUDGET) --task-duration SECONDS How long the CRS should fuzz (default: $TASK_DURATION) --task-json FILE Custom trigger_task payload (default: example-libpng) - --no-build Skip 'docker compose build' (use existing images) + --image-tag TAG Prebuilt GHCR image tag to run (default: $IMAGE_TAG) + --no-pull Skip 'docker compose pull' (use already-pulled images) + --no-build Deprecated alias for --no-pull (no local build happens) --keep-up Don't tear the stack down on exit (for debugging) --skip-wait Bring the stack up and submit the task, but don't block waiting on milestones (returns immediately) @@ -84,6 +93,7 @@ Required environment (at least one provider key, plus litellm master key): BUTTERCUP_LITELLM_KEY (optional, defaults to sk-1234 for local runs) Optional: + BUTTERCUP_IMAGE_TAG Prebuilt GHCR image tag (default: main; same as --image-tag) LANGFUSE_HOST, LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY The script writes ${ENV_FILE} from the values above each run. @@ -99,7 +109,9 @@ while [[ $# -gt 0 ]]; do --budget) BUDGET="$2"; shift 2 ;; --task-duration) TASK_DURATION="$2"; shift 2 ;; --task-json) TASK_JSON="$(cat "$2")"; shift 2 ;; - --no-build) DO_BUILD=0; shift ;; + --image-tag) IMAGE_TAG="$2"; shift 2 ;; + --no-pull) DO_PULL=0; shift ;; + --no-build) DO_PULL=0; shift ;; # deprecated alias --keep-up) DO_TEARDOWN=0; shift ;; --skip-wait) SKIP_WAIT=1; shift ;; --sarif) SARIF_RUN=1; shift ;; @@ -179,8 +191,12 @@ log "Writing ${ENV_FILE} (LITELLM_MAX_BUDGET=\$${BUDGET})" ############################################################################### # Always run compose from the compose dir so relative includes resolve. +# The compose.prebuilt.yaml overlay swaps every locally-built service for its +# prebuilt GHCR image, so nothing is built locally. dc() { - (cd "$COMPOSE_DIR" && docker compose "$@") + (cd "$COMPOSE_DIR" \ + && BUTTERCUP_IMAGE_TAG="$IMAGE_TAG" \ + docker compose -f compose.yaml -f compose.prebuilt.yaml "$@") } teardown() { @@ -188,7 +204,7 @@ teardown() { log "Tearing the stack down (docker compose down -v)" dc down -v --remove-orphans || true else - warn "Leaving the stack up (--keep-up). Tear down with: cd ${COMPOSE_DIR} && docker compose down -v" + warn "Leaving the stack up (--keep-up). Tear down with: cd ${COMPOSE_DIR} && docker compose -f compose.yaml -f compose.prebuilt.yaml down -v" fi } @@ -206,13 +222,13 @@ trap on_exit EXIT INT TERM # Bring the stack up ############################################################################### -if [[ "$DO_BUILD" -eq 1 ]]; then - log "Building docker compose images (this can take a while the first time)" - if ! dc build; then - err "docker compose build failed. On non-x86_64 hosts this usually means an" - err "image (e.g. fuzzer/Dockerfile -> gcr.io/oss-fuzz-base/base-runner) is amd64-only." - err "Inspect the build output above; retry on an x86_64 host, or install" - err "qemu-user-static + binfmt and re-run with DOCKER_DEFAULT_PLATFORM=linux/amd64." +if [[ "$DO_PULL" -eq 1 ]]; then + log "Pulling prebuilt component images from GHCR (tag: ${IMAGE_TAG})" + if ! dc pull; then + err "docker compose pull failed for tag '${IMAGE_TAG}'." + err "Check that the tag exists at ghcr.io/trailofbits/buttercup/* and that" + err "you can reach GHCR (private images need 'docker login ghcr.io')." + err "Override with --image-tag or BUTTERCUP_IMAGE_TAG=..." exit 1 fi fi From a25a525ad45b8e55e068ef1867ec0c4b5c51792e Mon Sep 17 00:00:00 2001 From: Riccardo Schirone Date: Fri, 15 May 2026 12:31:40 +0000 Subject: [PATCH 03/10] fix(scripts): preserve existing .env values in e2e.sh e2e.sh regenerates dev/docker-compose/.env from scratch every run, sourcing values only from environment variables. Variables not exported (notably LANGFUSE_HOST/PUBLIC_KEY/SECRET_KEY) were defaulted to empty and written back, clobbering values a user had set directly in .env. Add prev_env() and a 3-tier resolution: environment > existing .env > placeholder. Manually-set .env values (Langfuse creds, provider keys, litellm key) now survive subsequent runs. Co-Authored-By: Claude Opus 4.7 (1M context) --- scripts/e2e.sh | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/scripts/e2e.sh b/scripts/e2e.sh index 7fa0ed0e..44dc6dbe 100755 --- a/scripts/e2e.sh +++ b/scripts/e2e.sh @@ -155,8 +155,29 @@ if [[ "$provider_keys_set" -eq 0 ]]; then exit 1 fi -# If keys are missing, leave them at the placeholder so litellm still loads the -# config (some models will fail at request time, others will succeed). +# Read a value already present in the existing .env. Used so that variables +# not provided via the environment (e.g. LANGFUSE_*) are preserved across runs +# instead of being clobbered with empty/placeholder values, since this script +# regenerates .env from scratch on every run. +prev_env() { + [[ -f "$ENV_FILE" ]] || return 0 + sed -n "s/^$1=//p" "$ENV_FILE" | head -n1 +} + +# 1) Prefer the environment; 2) fall back to whatever is already in .env. +: "${ANTHROPIC_API_KEY:=$(prev_env ANTHROPIC_API_KEY)}" +: "${OPENAI_API_KEY:=$(prev_env OPENAI_API_KEY)}" +: "${GEMINI_API_KEY:=$(prev_env GEMINI_API_KEY)}" +: "${AZURE_API_BASE:=$(prev_env AZURE_API_BASE)}" +: "${AZURE_API_KEY:=$(prev_env AZURE_API_KEY)}" +: "${BUTTERCUP_LITELLM_KEY:=$(prev_env BUTTERCUP_LITELLM_KEY)}" +: "${LANGFUSE_HOST:=$(prev_env LANGFUSE_HOST)}" +: "${LANGFUSE_PUBLIC_KEY:=$(prev_env LANGFUSE_PUBLIC_KEY)}" +: "${LANGFUSE_SECRET_KEY:=$(prev_env LANGFUSE_SECRET_KEY)}" + +# 3) Final placeholders if still unset after both env and .env. Keys left at +# the placeholder so litellm still loads its config (some models will fail at +# request time, others will succeed). LANGFUSE_* stay empty (telemetry off). : "${ANTHROPIC_API_KEY:=}" : "${OPENAI_API_KEY:=}" : "${GEMINI_API_KEY:=}" From 7616b37953c9c1a0476be7888e2d9b03a08ef0fc Mon Sep 17 00:00:00 2001 From: Riccardo Schirone Date: Fri, 15 May 2026 14:45:42 +0000 Subject: [PATCH 04/10] fix(scripts): use explicit if-then-else in e2e.sh to satisfy shellcheck Replace the `wait_for ... && record ok || record TIMEOUT` and `curl ... && record ok || record fail` constructs with explicit if-then-else blocks. shellcheck flagged these as SC2015 (A && B || C is not if-then-else), causing the "Lint shell scripts" step in the Static Checks workflow to fail. Behavior is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) --- scripts/e2e.sh | 80 ++++++++++++++++++++++++++++++++++---------------- 1 file changed, 54 insertions(+), 26 deletions(-) diff --git a/scripts/e2e.sh b/scripts/e2e.sh index 44dc6dbe..8816a01b 100755 --- a/scripts/e2e.sh +++ b/scripts/e2e.sh @@ -357,30 +357,45 @@ capture_line() { declare -a SUMMARY=() record() { SUMMARY+=("$1"); } -wait_for scheduler \ +if wait_for scheduler \ "Processing build output for type FUZZER" \ - "$BUILD_TIMEOUT" "fuzzer build processed" \ - && record "fuzzer-build: ok" || record "fuzzer-build: TIMEOUT" + "$BUILD_TIMEOUT" "fuzzer build processed"; then + record "fuzzer-build: ok" +else + record "fuzzer-build: TIMEOUT" +fi -wait_for scheduler \ +if wait_for scheduler \ "POV submission response: pov_id=" \ - "$VULN_TIMEOUT" "vulnerability (POV) submitted" \ - && record "pov-submit: ok" || record "pov-submit: TIMEOUT" + "$VULN_TIMEOUT" "vulnerability (POV) submitted"; then + record "pov-submit: ok" +else + record "pov-submit: TIMEOUT" +fi -wait_for scheduler \ +if wait_for scheduler \ "Updated POV status. New status PASSED" \ - "$VULN_TIMEOUT" "POV accepted by competition API" \ - && record "pov-passed: ok" || record "pov-passed: TIMEOUT" + "$VULN_TIMEOUT" "POV accepted by competition API"; then + record "pov-passed: ok" +else + record "pov-passed: TIMEOUT" +fi -wait_for seed-gen \ +if wait_for seed-gen \ "Copied [1-9][0-9]* files to corpus" \ - "$SEED_GEN_TIMEOUT" "seed-gen produced seeds" \ - && record "seed-gen: ok" || record "seed-gen: TIMEOUT" + "$SEED_GEN_TIMEOUT" "seed-gen produced seeds"; then + record "seed-gen: ok" +else + record "seed-gen: TIMEOUT" +fi -wait_for scheduler \ +if wait_for scheduler \ "Appending patch for task" \ - "$PATCH_TIMEOUT" "patch generated" \ - && record "patch-generated: ok" || record "patch-generated: TIMEOUT" + "$PATCH_TIMEOUT" "patch generated"; then + record "patch-generated: ok" +else + record "patch-generated: TIMEOUT" +fi # Approve the patch (the local UI requires explicit approval, unlike scored # rounds where it is automatic). @@ -391,9 +406,13 @@ if [[ -n "$PATCH_LINE" ]]; then TASK_ID=$(printf '%s' "$PATCH_LINE" | sed -n 's/.*\[\([^]]*\)\].*/\1/p' | sed 's/^[^:]*://') if [[ -n "$PATCH_ID" && -n "$TASK_ID" ]]; then log "Approving patch ${C_DIM}task=${TASK_ID} patch=${PATCH_ID}${C_RST}" - curl -fsS -X POST \ + if curl -fsS -X POST \ "http://127.0.0.1:31323/v1/task/${TASK_ID}/patch/${PATCH_ID}/approve" \ - >/dev/null && record "patch-approve: ok" || record "patch-approve: HTTP fail" + >/dev/null; then + record "patch-approve: ok" + else + record "patch-approve: HTTP fail" + fi else warn "Could not extract patch/task ids from: $PATCH_LINE" record "patch-approve: skipped (parse fail)" @@ -403,15 +422,21 @@ else record "patch-approve: skipped (no patch line)" fi -wait_for scheduler \ +if wait_for scheduler \ "Patch passed" \ - "$PATCH_TIMEOUT" "patch accepted by competition API" \ - && record "patch-passed: ok" || record "patch-passed: TIMEOUT" + "$PATCH_TIMEOUT" "patch accepted by competition API"; then + record "patch-passed: ok" +else + record "patch-passed: TIMEOUT" +fi -wait_for scheduler \ +if wait_for scheduler \ "Bundle submission response: bundle_id=" \ - "$BUNDLE_TIMEOUT" "bundle submitted" \ - && record "bundle-submit: ok" || record "bundle-submit: TIMEOUT" + "$BUNDLE_TIMEOUT" "bundle submitted"; then + record "bundle-submit: ok" +else + record "bundle-submit: TIMEOUT" +fi if [[ "$SARIF_RUN" -eq 1 ]]; then SARIF_TASK_ID="${TASK_ID:-}" @@ -428,10 +453,13 @@ if [[ "$SARIF_RUN" -eq 1 ]]; then else record "sarif-send: HTTP fail" fi - wait_for scheduler \ + if wait_for scheduler \ "Matching SARIF submission response" \ - "$BUNDLE_TIMEOUT" "SARIF accepted" \ - && record "sarif-passed: ok" || record "sarif-passed: TIMEOUT" + "$BUNDLE_TIMEOUT" "SARIF accepted"; then + record "sarif-passed: ok" + else + record "sarif-passed: TIMEOUT" + fi else record "sarif: skipped (no task id)" fi From ba140d902c52a677899e6176b6d1b02c7aef6af6 Mon Sep 17 00:00:00 2001 From: Riccardo Schirone Date: Mon, 18 May 2026 10:29:15 +0000 Subject: [PATCH 05/10] fix(scripts): make e2e.sh wait_for robust to pipefail+SIGPIPE With `set -o pipefail`, `dc logs ... | grep -m1` makes the upstream `docker compose logs` die with SIGPIPE (rc 141) once grep matches the first line; pipefail then fails the whole pipeline, so milestones whose log line appears early in a high-volume stream (e.g. seed-gen's 'Copied N files to corpus') are never registered and wait_for spins until timeout even though the milestone occurred. Capture grep output with '|| true' and test for non-empty instead. Co-Authored-By: Claude Opus 4.7 (1M context) --- scripts/e2e.sh | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/scripts/e2e.sh b/scripts/e2e.sh index 8816a01b..9b2e3ecf 100755 --- a/scripts/e2e.sh +++ b/scripts/e2e.sh @@ -329,8 +329,15 @@ wait_for() { while [[ $(date +%s) -lt $deadline ]]; do # --no-color so the grep matches plain text; --tail=all replays history. - if dc logs --no-color --no-log-prefix --tail=all "$service" 2>/dev/null \ - | grep -m1 -E "$pattern" >/dev/null; then + # NOTE: capture into a var with `|| true` instead of `if cmd | grep`. + # Under `set -o pipefail`, `grep -m1` exits on the first match and the + # upstream `docker compose logs` then dies with SIGPIPE (rc 141), which + # would make the whole pipeline "fail" and the milestone never register + # for high-volume services whose match is early in the stream. + local match + match="$(dc logs --no-color --no-log-prefix --tail=all "$service" 2>/dev/null \ + | grep -m1 -E "$pattern" || true)" + if [[ -n "$match" ]]; then ok "Reached: ${label}" return 0 fi From f39763178054674a30681a04d8129a2b2f424426 Mon Sep 17 00:00:00 2001 From: Riccardo Schirone Date: Tue, 19 May 2026 07:29:17 +0000 Subject: [PATCH 06/10] refactor(scripts): simplify e2e.sh to budget/duration/tag/no-pull Drop --no-build, --keep-up, --skip-wait, --sarif, --task-json and the per-phase --*-timeout flags. The stack now always tears down on exit; milestone timeouts are internal constants. Addresses PR #552 review: - provider-key check moved below the .env fallback so keys saved to .env on a prior run are accepted (tip is now accurate) - --task-json removed (was silently falling back to the libpng default) - trigger_task response uses mktemp + on_exit cleanup instead of a predictable /tmp/e2e_task_resp.$$ leaked on SIGINT/SIGTERM - --no-build phantom "deprecated alias" removed Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/commands/e2e.md | 18 +++--- Makefile | 2 +- scripts/e2e.sh | 135 +++++++++++----------------------------- 3 files changed, 45 insertions(+), 110 deletions(-) diff --git a/.claude/commands/e2e.md b/.claude/commands/e2e.md index 1e94f492..d757fc7b 100644 --- a/.claude/commands/e2e.md +++ b/.claude/commands/e2e.md @@ -1,6 +1,6 @@ --- description: Run a Docker-only end-to-end smoke test of Buttercup against example-libpng with a low LLM budget, and monitor the pipeline. -argument-hint: "[--budget N] [--task-duration SEC] [--image-tag TAG] [--keep-up] [--no-pull] [--skip-wait] [--sarif]" +argument-hint: "[--budget N] [--task-duration SEC] [--image-tag TAG] [--no-pull]" allowed-tools: Bash(./scripts/e2e.sh:*), Bash(make e2e*), Bash(docker compose:*), Bash(cd dev/docker-compose && docker compose:*), Read --- @@ -16,11 +16,11 @@ Mirrors the milestones in `.github/workflows/system-integration.yml`, but tails ## What it does -1. Checks for `docker`, `docker compose`, `curl`, and at least one LLM provider key (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or `GEMINI_API_KEY`) in your env. +1. Checks for `docker`, `docker compose`, `curl`, and at least one LLM provider key (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or `GEMINI_API_KEY`) in your env (or already saved in `dev/docker-compose/.env`). 2. Writes `dev/docker-compose/.env` with the provider keys and `LITELLM_MAX_BUDGET=$BUDGET` (default `3`). 3. Pulls the prebuilt component images (`docker compose -f compose.yaml -f compose.prebuilt.yaml pull`, skippable with `--no-pull`) and starts every service (redis, dind, litellm, task-server, task-downloader, scheduler, program-model, build-bot, fuzzer-bot, coverage-bot, tracer-bot, seed-gen, patcher, buttercup-ui). No local image build. 4. POSTs the canned libpng `trigger_task` payload to `http://localhost:31323/webhook/trigger_task`. -5. Waits, in order, for these scheduler/seed-gen log markers (timeout configurable per phase): +5. Waits, in order, for these scheduler/seed-gen log markers: - `Processing build output for type FUZZER` — fuzzer build done - `POV submission response: pov_id=` — vulnerability found and POV submitted - `Updated POV status. New status PASSED` — POV accepted by competition API @@ -29,8 +29,7 @@ Mirrors the milestones in `.github/workflows/system-integration.yml`, but tails - approves the patch via `POST /v1/task//patch//approve` - `Patch passed` — patch accepted - `Bundle submission response: bundle_id=` — bundle submitted -6. With `--sarif`, also sends a SARIF broadcast and waits for `Matching SARIF submission response`. -7. Prints a colored summary and tears the stack down with `docker compose down -v` (unless `--keep-up`). +6. Prints a colored summary and tears the stack down with `docker compose down -v`. ## Run it @@ -41,12 +40,11 @@ The driver is `scripts/e2e.sh`. The `Makefile` exposes `make e2e`. make e2e # Pass flags through the Makefile -make e2e E2E_ARGS="--budget 5 --keep-up" +make e2e E2E_ARGS="--budget 5 --no-pull" # Or call the script directly ./scripts/e2e.sh --budget 3 --task-duration 1800 -./scripts/e2e.sh --skip-wait --keep-up # just bring the stack up + submit task -./scripts/e2e.sh --sarif # also exercise the SARIF flow +./scripts/e2e.sh --image-tag my-branch --no-pull # run already-present images ``` The script writes/overwrites `dev/docker-compose/.env` on each run. @@ -77,7 +75,7 @@ The web UI is at `http://localhost:31323` (no port-forward needed — it's publi cd dev/docker-compose && docker compose down -v --remove-orphans ``` -`scripts/e2e.sh` does this automatically on exit unless you pass `--keep-up`. +`scripts/e2e.sh` does this automatically on exit. ## When you invoke /e2e @@ -88,4 +86,4 @@ When the user runs `/e2e`, default behavior: 3. If the run fails on a milestone, fetch the last ~50 lines of the relevant service: - `cd dev/docker-compose && docker compose logs --tail=50 ` 4. If the user asks to keep digging, expand the watch with `docker compose logs -f ` until the user is satisfied. -5. On success, summarize the milestones reached and remind the user the stack is already torn down (or still up, if `--keep-up`). +5. On success, summarize the milestones reached and remind the user the stack is already torn down. diff --git a/Makefile b/Makefile index ca083f9c..a5f0d445 100644 --- a/Makefile +++ b/Makefile @@ -152,7 +152,7 @@ send-libpng-task: kill $$PORT_FORWARD_PID 2>/dev/null || true # Docker-only end-to-end run against example-libpng. No Kubernetes required. -# Pass extra flags via E2E_ARGS, e.g.: make e2e E2E_ARGS="--keep-up --budget 5" +# Pass extra flags via E2E_ARGS, e.g.: make e2e E2E_ARGS="--budget 5 --no-pull" e2e: @./scripts/e2e.sh $(E2E_ARGS) diff --git a/scripts/e2e.sh b/scripts/e2e.sh index 9b2e3ecf..b25c3aaf 100755 --- a/scripts/e2e.sh +++ b/scripts/e2e.sh @@ -25,20 +25,19 @@ ENV_FILE="${COMPOSE_DIR}/.env" # Defaults — overridable via flags or environment. BUDGET="${LITELLM_MAX_BUDGET:-3}" TASK_DURATION="${E2E_TASK_DURATION:-1800}" -BUILD_TIMEOUT="${E2E_BUILD_TIMEOUT:-1800}" # seconds (fuzzer build) -VULN_TIMEOUT="${E2E_VULN_TIMEOUT:-1800}" -PATCH_TIMEOUT="${E2E_PATCH_TIMEOUT:-1800}" -BUNDLE_TIMEOUT="${E2E_BUNDLE_TIMEOUT:-300}" -SEED_GEN_TIMEOUT="${E2E_SEED_GEN_TIMEOUT:-1800}" # Prebuilt GHCR images instead of local builds (compose.prebuilt.yaml overlay). IMAGE_TAG="${BUTTERCUP_IMAGE_TAG:-main}" DO_PULL=1 -DO_TEARDOWN=1 -SKIP_WAIT=0 -TASK_JSON="" # if set, used instead of the canned libpng payload -SARIF_RUN=0 + +# Internal milestone timeouts (seconds). Bundle submission is quick; the rest +# (build, vuln, seed-gen, patch) can each take a while on a low-budget run. +MILESTONE_TIMEOUT=1800 +BUNDLE_TIMEOUT=300 + +# Temp file for the trigger_task HTTP response; cleaned up on exit. +TASK_RESP="" ############################################################################### # Logging @@ -72,20 +71,8 @@ milestones tracked by .github/workflows/system-integration.yml. Options: --budget DOLLARS LiteLLM per-user max budget (default: $BUDGET) --task-duration SECONDS How long the CRS should fuzz (default: $TASK_DURATION) - --task-json FILE Custom trigger_task payload (default: example-libpng) --image-tag TAG Prebuilt GHCR image tag to run (default: $IMAGE_TAG) --no-pull Skip 'docker compose pull' (use already-pulled images) - --no-build Deprecated alias for --no-pull (no local build happens) - --keep-up Don't tear the stack down on exit (for debugging) - --skip-wait Bring the stack up and submit the task, but don't - block waiting on milestones (returns immediately) - --sarif Also submit a SARIF broadcast after the patch - passes and wait for the matching SARIF response - --build-timeout SEC Override fuzzer-build milestone timeout (default $BUILD_TIMEOUT) - --vuln-timeout SEC Override vuln milestone timeout (default $VULN_TIMEOUT) - --patch-timeout SEC Override patch milestone timeout (default $PATCH_TIMEOUT) - --bundle-timeout SEC Override bundle milestone timeout (default $BUNDLE_TIMEOUT) - --seed-gen-timeout SEC Override seed-gen milestone timeout (default $SEED_GEN_TIMEOUT) -h, --help Print this help Required environment (at least one provider key, plus litellm master key): @@ -108,18 +95,8 @@ while [[ $# -gt 0 ]]; do case "$1" in --budget) BUDGET="$2"; shift 2 ;; --task-duration) TASK_DURATION="$2"; shift 2 ;; - --task-json) TASK_JSON="$(cat "$2")"; shift 2 ;; --image-tag) IMAGE_TAG="$2"; shift 2 ;; --no-pull) DO_PULL=0; shift ;; - --no-build) DO_PULL=0; shift ;; # deprecated alias - --keep-up) DO_TEARDOWN=0; shift ;; - --skip-wait) SKIP_WAIT=1; shift ;; - --sarif) SARIF_RUN=1; shift ;; - --build-timeout) BUILD_TIMEOUT="$2"; shift 2 ;; - --vuln-timeout) VULN_TIMEOUT="$2"; shift 2 ;; - --patch-timeout) PATCH_TIMEOUT="$2"; shift 2 ;; - --bundle-timeout) BUNDLE_TIMEOUT="$2"; shift 2 ;; - --seed-gen-timeout) SEED_GEN_TIMEOUT="$2"; shift 2 ;; -h|--help) usage; exit 0 ;; *) err "Unknown argument: $1"; usage; exit 2 ;; esac @@ -142,19 +119,6 @@ if ! command -v curl >/dev/null 2>&1; then exit 1 fi -provider_keys_set=0 -for v in ANTHROPIC_API_KEY OPENAI_API_KEY GEMINI_API_KEY; do - val="${!v:-}" - if [[ -n "$val" && "$val" != "" ]]; then - provider_keys_set=1 - fi -done -if [[ "$provider_keys_set" -eq 0 ]]; then - err "No LLM provider key found in env. Set ANTHROPIC_API_KEY, OPENAI_API_KEY, or GEMINI_API_KEY." - err "Tip: 'export ANTHROPIC_API_KEY=...; scripts/e2e.sh' or add to ${ENV_FILE} first." - exit 1 -fi - # Read a value already present in the existing .env. Used so that variables # not provided via the environment (e.g. LANGFUSE_*) are preserved across runs # instead of being clobbered with empty/placeholder values, since this script @@ -175,6 +139,21 @@ prev_env() { : "${LANGFUSE_PUBLIC_KEY:=$(prev_env LANGFUSE_PUBLIC_KEY)}" : "${LANGFUSE_SECRET_KEY:=$(prev_env LANGFUSE_SECRET_KEY)}" +# Require at least one usable provider key. Checked *after* the .env fallback +# above so a key saved to .env on a prior run still counts. +provider_keys_set=0 +for v in ANTHROPIC_API_KEY OPENAI_API_KEY GEMINI_API_KEY; do + val="${!v:-}" + if [[ -n "$val" && "$val" != "" ]]; then + provider_keys_set=1 + fi +done +if [[ "$provider_keys_set" -eq 0 ]]; then + err "No LLM provider key found. Set ANTHROPIC_API_KEY, OPENAI_API_KEY, or GEMINI_API_KEY." + err "Tip: 'export ANTHROPIC_API_KEY=...; scripts/e2e.sh' or add it to ${ENV_FILE} first." + exit 1 +fi + # 3) Final placeholders if still unset after both env and .env. Keys left at # the placeholder so litellm still loads its config (some models will fail at # request time, others will succeed). LANGFUSE_* stay empty (telemetry off). @@ -220,18 +199,11 @@ dc() { docker compose -f compose.yaml -f compose.prebuilt.yaml "$@") } -teardown() { - if [[ "$DO_TEARDOWN" -eq 1 ]]; then - log "Tearing the stack down (docker compose down -v)" - dc down -v --remove-orphans || true - else - warn "Leaving the stack up (--keep-up). Tear down with: cd ${COMPOSE_DIR} && docker compose -f compose.yaml -f compose.prebuilt.yaml down -v" - fi -} - on_exit() { rc=$? - teardown + [[ -n "$TASK_RESP" ]] && rm -f "$TASK_RESP" + log "Tearing the stack down (docker compose down -v)" + dc down -v --remove-orphans || true if [[ $rc -ne 0 ]]; then err "e2e run finished with exit code $rc" fi @@ -280,8 +252,7 @@ ok "buttercup-ui is up." # Submit the task ############################################################################### -if [[ -z "$TASK_JSON" ]]; then - TASK_JSON=$(cat </dev/null 2>&1; then - record "sarif-send: ok" - else - record "sarif-send: HTTP fail" - fi - if wait_for scheduler \ - "Matching SARIF submission response" \ - "$BUNDLE_TIMEOUT" "SARIF accepted"; then - record "sarif-passed: ok" - else - record "sarif-passed: TIMEOUT" - fi - else - record "sarif: skipped (no task id)" - fi -fi - ############################################################################### # Summary ############################################################################### From acf39c237ab93a34747605807e3bbeace2eebde8 Mon Sep 17 00:00:00 2001 From: Riccardo Schirone Date: Tue, 19 May 2026 08:02:45 +0000 Subject: [PATCH 07/10] refactor(scripts): drop user-facing BUTTERCUP_LITELLM_KEY from e2e.sh The local litellm master key is an internal detail of the docker-compose stack, not something the user should set. Remove it from the usage text and the env/.env resolution; e2e.sh now just writes the local default (sk-1234) into the generated .env. Co-Authored-By: Claude Opus 4.7 (1M context) --- scripts/e2e.sh | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/scripts/e2e.sh b/scripts/e2e.sh index b25c3aaf..82177c45 100755 --- a/scripts/e2e.sh +++ b/scripts/e2e.sh @@ -75,9 +75,8 @@ Options: --no-pull Skip 'docker compose pull' (use already-pulled images) -h, --help Print this help -Required environment (at least one provider key, plus litellm master key): +Required environment (at least one provider key): ANTHROPIC_API_KEY and/or OPENAI_API_KEY and/or GEMINI_API_KEY - BUTTERCUP_LITELLM_KEY (optional, defaults to sk-1234 for local runs) Optional: BUTTERCUP_IMAGE_TAG Prebuilt GHCR image tag (default: main; same as --image-tag) @@ -134,7 +133,6 @@ prev_env() { : "${GEMINI_API_KEY:=$(prev_env GEMINI_API_KEY)}" : "${AZURE_API_BASE:=$(prev_env AZURE_API_BASE)}" : "${AZURE_API_KEY:=$(prev_env AZURE_API_KEY)}" -: "${BUTTERCUP_LITELLM_KEY:=$(prev_env BUTTERCUP_LITELLM_KEY)}" : "${LANGFUSE_HOST:=$(prev_env LANGFUSE_HOST)}" : "${LANGFUSE_PUBLIC_KEY:=$(prev_env LANGFUSE_PUBLIC_KEY)}" : "${LANGFUSE_SECRET_KEY:=$(prev_env LANGFUSE_SECRET_KEY)}" @@ -162,7 +160,6 @@ fi : "${GEMINI_API_KEY:=}" : "${AZURE_API_BASE:=}" : "${AZURE_API_KEY:=}" -: "${BUTTERCUP_LITELLM_KEY:=sk-1234}" : "${LANGFUSE_HOST:=}" : "${LANGFUSE_PUBLIC_KEY:=}" : "${LANGFUSE_SECRET_KEY:=}" @@ -174,7 +171,8 @@ fi log "Writing ${ENV_FILE} (LITELLM_MAX_BUDGET=\$${BUDGET})" { echo "# Generated by scripts/e2e.sh on $(date -Is)" - echo "BUTTERCUP_LITELLM_KEY=${BUTTERCUP_LITELLM_KEY}" + # litellm master key — internal to the local stack, not user-facing. + echo "BUTTERCUP_LITELLM_KEY=sk-1234" echo "ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}" echo "OPENAI_API_KEY=${OPENAI_API_KEY}" echo "GEMINI_API_KEY=${GEMINI_API_KEY}" From 4ec48c35328de4a1a2d87d3b418b8b6c274db302 Mon Sep 17 00:00:00 2001 From: Riccardo Schirone Date: Tue, 19 May 2026 08:11:54 +0000 Subject: [PATCH 08/10] fix(scripts): don't clobber LANGFUSE_* with empty values in e2e.sh e2e.sh regenerates dev/docker-compose/.env every run and was always writing LANGFUSE_HOST=/PUBLIC_KEY=/SECRET_KEY= even when unset. Since .env is loaded last in compose's env_file list, an empty value silently disabled Langfuse telemetry. Now resolved env -> existing .env, and the LANGFUSE_* lines are only written when non-empty, so values the user set in .env survive across runs. Co-Authored-By: Claude Opus 4.7 (1M context) --- scripts/e2e.sh | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/scripts/e2e.sh b/scripts/e2e.sh index 82177c45..5fb266fd 100755 --- a/scripts/e2e.sh +++ b/scripts/e2e.sh @@ -120,8 +120,8 @@ fi # Read a value already present in the existing .env. Used so that variables # not provided via the environment (e.g. LANGFUSE_*) are preserved across runs -# instead of being clobbered with empty/placeholder values, since this script -# regenerates .env from scratch on every run. +# instead of being clobbered, since this script regenerates .env from scratch +# on every run. prev_env() { [[ -f "$ENV_FILE" ]] || return 0 sed -n "s/^$1=//p" "$ENV_FILE" | head -n1 @@ -154,7 +154,9 @@ fi # 3) Final placeholders if still unset after both env and .env. Keys left at # the placeholder so litellm still loads its config (some models will fail at -# request time, others will succeed). LANGFUSE_* stay empty (telemetry off). +# request time, others will succeed). LANGFUSE_* are intentionally left unset +# here: empty lines are NOT written to .env below, so a run without them set +# never clobbers LANGFUSE_* the user previously had in .env. : "${ANTHROPIC_API_KEY:=}" : "${OPENAI_API_KEY:=}" : "${GEMINI_API_KEY:=}" @@ -179,9 +181,12 @@ log "Writing ${ENV_FILE} (LITELLM_MAX_BUDGET=\$${BUDGET})" echo "AZURE_API_BASE=${AZURE_API_BASE}" echo "AZURE_API_KEY=${AZURE_API_KEY}" echo "LITELLM_MAX_BUDGET=${BUDGET}" - echo "LANGFUSE_HOST=${LANGFUSE_HOST}" - echo "LANGFUSE_PUBLIC_KEY=${LANGFUSE_PUBLIC_KEY}" - echo "LANGFUSE_SECRET_KEY=${LANGFUSE_SECRET_KEY}" + # Only emit LANGFUSE_* when we actually have a value, so a run without + # them set leaves no empty LANGFUSE_HOST= behind to disable telemetry. + [[ -n "$LANGFUSE_HOST" ]] && echo "LANGFUSE_HOST=${LANGFUSE_HOST}" + [[ -n "$LANGFUSE_PUBLIC_KEY" ]] && echo "LANGFUSE_PUBLIC_KEY=${LANGFUSE_PUBLIC_KEY}" + [[ -n "$LANGFUSE_SECRET_KEY" ]] && echo "LANGFUSE_SECRET_KEY=${LANGFUSE_SECRET_KEY}" + true } > "$ENV_FILE" ############################################################################### From dc77e02809260b48f3817e8878d21d245fd9ecd8 Mon Sep 17 00:00:00 2001 From: Riccardo Schirone Date: Tue, 19 May 2026 08:54:58 +0000 Subject: [PATCH 09/10] fix(scripts): match real summary log markers for POV/bundle milestones MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The pov-submit and bundle-submit waiters used "POV submission response: pov_id=" and "Bundle submission response: bundle_id=" which never match any rendered log line: the only "... submission response:" logs are logger.debug calls whose payload is an API object repr (no literal pov_id=/bundle_id=), while pov_id=/bundle_id= appear only in the separate structured summary line (logger.info) with a different prefix. Result: both milestones always timed out, so every run — including fully successful ones — wasted MILESTONE_TIMEOUT+BUNDLE_TIMEOUT and exited non-zero. Repoint both to the structured summary tokens (pov_id= / bundle_id=) and sync the marker list in .claude/commands/e2e.md. Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/commands/e2e.md | 4 ++-- scripts/e2e.sh | 9 +++++++-- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/.claude/commands/e2e.md b/.claude/commands/e2e.md index d757fc7b..7c81d7e3 100644 --- a/.claude/commands/e2e.md +++ b/.claude/commands/e2e.md @@ -22,13 +22,13 @@ Mirrors the milestones in `.github/workflows/system-integration.yml`, but tails 4. POSTs the canned libpng `trigger_task` payload to `http://localhost:31323/webhook/trigger_task`. 5. Waits, in order, for these scheduler/seed-gen log markers: - `Processing build output for type FUZZER` — fuzzer build done - - `POV submission response: pov_id=` — vulnerability found and POV submitted + - `pov_id=` — vulnerability found and POV submitted - `Updated POV status. New status PASSED` — POV accepted by competition API - `Copied N files to corpus` — seed-gen produced seeds - `Appending patch for task` — patch generated - approves the patch via `POST /v1/task//patch//approve` - `Patch passed` — patch accepted - - `Bundle submission response: bundle_id=` — bundle submitted + - `bundle_id=` — bundle submitted 6. Prints a colored summary and tears the stack down with `docker compose down -v`. ## Run it diff --git a/scripts/e2e.sh b/scripts/e2e.sh index 5fb266fd..5b08e428 100755 --- a/scripts/e2e.sh +++ b/scripts/e2e.sh @@ -339,8 +339,11 @@ else record "fuzzer-build: TIMEOUT" fi +# NOTE: match the structured summary line (`[i:task] pov_id= ...`, +# logger.info), NOT the "POV submission response:" debug line whose payload is +# an API object repr that never contains a literal `pov_id=`. if wait_for scheduler \ - "POV submission response: pov_id=" \ + "pov_id=" \ "$MILESTONE_TIMEOUT" "vulnerability (POV) submitted"; then record "pov-submit: ok" else @@ -404,8 +407,10 @@ else record "patch-passed: TIMEOUT" fi +# NOTE: same as POV above — match the structured summary `bundle_id=` +# (logger.info), not the "Bundle submission response:" debug object repr. if wait_for scheduler \ - "Bundle submission response: bundle_id=" \ + "bundle_id=" \ "$BUNDLE_TIMEOUT" "bundle submitted"; then record "bundle-submit: ok" else From c1856c4ae38f5e5b1445d904745d79db99337f3a Mon Sep 17 00:00:00 2001 From: Riccardo Schirone Date: Tue, 19 May 2026 11:10:48 +0000 Subject: [PATCH 10/10] fix(scripts): e2e.sh approval wait-loop + viable budget/duration defaults Three defects found while verifying the pipeline end-to-end: 1. Approval one-shot race: capture_line 'competition_patch_id=' ran once right after the patch-generated milestone, but the scheduler logs that id only minutes later (after it builds+verifies+submits the patch). The capture always lost the race, so approval was always skipped and the local stack never reached Patch passed / bundle. Replace with a wait_capture() poll loop (mirrors wait_for) so approval actually fires. 2. Default --task-duration 1800 is self-defeating: build->POV->seed-gen-> patch exceeds 30 min on normal hardware, so the task expires mid-patch ("task expired/cancelled? Will discard") and never reaches patch/bundle. Default to 7200 so the task outlives the pipeline. 3. Default --budget 3 cannot reach patch/bundle: a full run through patch generation costs ~$10; $3 is exhausted around POV. Default to 10. e2e.md updated to match (defaults, the cheap --budget 3 caveat, and the poll-then-approve description). Co-Authored-By: Claude Opus 4.7 (1M context) --- .claude/commands/e2e.md | 13 ++++++------ scripts/e2e.sh | 47 +++++++++++++++++++++++++++++++++++++---- 2 files changed, 50 insertions(+), 10 deletions(-) diff --git a/.claude/commands/e2e.md b/.claude/commands/e2e.md index 7c81d7e3..a3da6ea2 100644 --- a/.claude/commands/e2e.md +++ b/.claude/commands/e2e.md @@ -6,7 +6,7 @@ allowed-tools: Bash(./scripts/e2e.sh:*), Bash(make e2e*), Bash(docker compose:*) # /e2e — Docker-only end-to-end Buttercup run (example-libpng) -This command exercises the full Buttercup pipeline on the [example-libpng](https://github.com/tob-challenges/example-libpng) challenge **using Docker only — no Kubernetes/minikube**. It uses the `dev/docker-compose/` stack with the **`compose.prebuilt.yaml` overlay** — every component runs from its prebuilt GHCR image (`ghcr.io/trailofbits/buttercup/*`, tag `main` by default), so **nothing is built locally**. A low LiteLLM budget (default **$3**) keeps an accidental run cheap. +This command exercises the full Buttercup pipeline on the [example-libpng](https://github.com/tob-challenges/example-libpng) challenge **using Docker only — no Kubernetes/minikube**. It uses the `dev/docker-compose/` stack with the **`compose.prebuilt.yaml` overlay** — every component runs from its prebuilt GHCR image (`ghcr.io/trailofbits/buttercup/*`, tag `main` by default), so **nothing is built locally**. A LiteLLM budget cap (default **$10**) bounds the spend — a full run through patch generation costs roughly that; a lower cap stops the pipeline before patch/bundle, so `--budget 3` only exercises up to seed-gen. > **Image tag:** defaults to `main`. Override with `--image-tag ` or `BUTTERCUP_IMAGE_TAG=...` to test a specific build. Private images require `docker login ghcr.io` first. > @@ -17,7 +17,7 @@ Mirrors the milestones in `.github/workflows/system-integration.yml`, but tails ## What it does 1. Checks for `docker`, `docker compose`, `curl`, and at least one LLM provider key (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or `GEMINI_API_KEY`) in your env (or already saved in `dev/docker-compose/.env`). -2. Writes `dev/docker-compose/.env` with the provider keys and `LITELLM_MAX_BUDGET=$BUDGET` (default `3`). +2. Writes `dev/docker-compose/.env` with the provider keys and `LITELLM_MAX_BUDGET=$BUDGET` (default `10`). The submitted task's `duration` defaults to `7200`s (2h) — the CRS discards a task's work once its deadline passes, and the full pipeline can exceed 30 min, so a short duration would expire mid-patch. 3. Pulls the prebuilt component images (`docker compose -f compose.yaml -f compose.prebuilt.yaml pull`, skippable with `--no-pull`) and starts every service (redis, dind, litellm, task-server, task-downloader, scheduler, program-model, build-bot, fuzzer-bot, coverage-bot, tracer-bot, seed-gen, patcher, buttercup-ui). No local image build. 4. POSTs the canned libpng `trigger_task` payload to `http://localhost:31323/webhook/trigger_task`. 5. Waits, in order, for these scheduler/seed-gen log markers: @@ -26,7 +26,7 @@ Mirrors the milestones in `.github/workflows/system-integration.yml`, but tails - `Updated POV status. New status PASSED` — POV accepted by competition API - `Copied N files to corpus` — seed-gen produced seeds - `Appending patch for task` — patch generated - - approves the patch via `POST /v1/task//patch//approve` + - polls for the `competition_patch_id=` summary line (logged only after the scheduler builds, verifies and submits the patch — minutes after the patch is generated), then approves via `POST /v1/task//patch//approve` - `Patch passed` — patch accepted - `bundle_id=` — bundle submitted 6. Prints a colored summary and tears the stack down with `docker compose down -v`. @@ -36,15 +36,16 @@ Mirrors the milestones in `.github/workflows/system-integration.yml`, but tails The driver is `scripts/e2e.sh`. The `Makefile` exposes `make e2e`. ```bash -# Plain run with the $3 budget default +# Plain run with the $10 budget / 7200s task-duration defaults make e2e # Pass flags through the Makefile -make e2e E2E_ARGS="--budget 5 --no-pull" +make e2e E2E_ARGS="--budget 15 --no-pull" # Or call the script directly -./scripts/e2e.sh --budget 3 --task-duration 1800 +./scripts/e2e.sh --budget 10 --task-duration 7200 ./scripts/e2e.sh --image-tag my-branch --no-pull # run already-present images +./scripts/e2e.sh --budget 3 # cheap: only reaches ~seed-gen ``` The script writes/overwrites `dev/docker-compose/.env` on each run. diff --git a/scripts/e2e.sh b/scripts/e2e.sh index 5b08e428..84f93799 100755 --- a/scripts/e2e.sh +++ b/scripts/e2e.sh @@ -23,8 +23,17 @@ COMPOSE_DIR="${REPO_ROOT}/dev/docker-compose" ENV_FILE="${COMPOSE_DIR}/.env" # Defaults — overridable via flags or environment. -BUDGET="${LITELLM_MAX_BUDGET:-3}" -TASK_DURATION="${E2E_TASK_DURATION:-1800}" +# +# BUDGET: a full run through patch generation costs ~$10 of LLM spend; $3 is +# exhausted during/just after POV, so anything past seed-gen would always time +# out. Default to 10 so the whole pipeline (incl. patch+bundle) is reachable. +# +# TASK_DURATION: the CRS discards a task's work once its deadline passes. On +# normal hardware build->POV->seed-gen->patch exceeds 30 min, so an 1800s task +# expires mid-patch ("task expired/cancelled? Will discard") and never reaches +# patch/bundle. Default to 7200 (2h) so the task outlives the pipeline. +BUDGET="${LITELLM_MAX_BUDGET:-10}" +TASK_DURATION="${E2E_TASK_DURATION:-7200}" # Prebuilt GHCR images instead of local builds (compose.prebuilt.yaml overlay). IMAGE_TAG="${BUTTERCUP_IMAGE_TAG:-main}" @@ -324,6 +333,33 @@ capture_line() { | grep -E "$pattern" | head -n1 || true } +# wait_capture SERVICE PATTERN TIMEOUT_SEC LABEL +# +# Like capture_line, but polls until the pattern appears or TIMEOUT_SEC +# elapses, echoing the first matching line on stdout (empty on timeout). +# Progress goes to stderr so stdout stays just the captured line. +# +# Needed because `competition_patch_id=` is logged by the scheduler only +# *after* it builds, verifies and submits the patch — minutes after the +# "Appending patch for task" milestone. A one-shot capture right after that +# milestone always races and loses, so approval would always be skipped. +wait_capture() { + local service="$1" pattern="$2" timeout="$3" label="$4" + local deadline=$(( $(date +%s) + timeout )) + log "Waiting to capture: ${label} ${C_DIM}(service=${service}, timeout=${timeout}s)${C_RST}" >&2 + while [[ $(date +%s) -lt $deadline ]]; do + local match + match="$(dc logs --no-color --no-log-prefix --tail=all "$service" 2>/dev/null \ + | grep -m1 -E "$pattern" || true)" + if [[ -n "$match" ]]; then + printf '%s\n' "$match" + return 0 + fi + sleep 15 + done + return 1 +} + ############################################################################### # Walk through the pipeline ############################################################################### @@ -375,8 +411,11 @@ else fi # Approve the patch (the local UI requires explicit approval, unlike scored -# rounds where it is automatic). -PATCH_LINE="$(capture_line scheduler 'competition_patch_id=')" +# rounds where it is automatic). competition_patch_id= only appears once the +# scheduler has built+verified+submitted the patch, well after the patch was +# generated, so poll for it rather than capturing once (which always races). +PATCH_LINE="$(wait_capture scheduler 'competition_patch_id=[0-9a-fA-F-]' \ + "$MILESTONE_TIMEOUT" "competition_patch_id (for approval)" || true)" if [[ -n "$PATCH_LINE" ]]; then PATCH_ID=$(printf '%s' "$PATCH_LINE" | sed -n 's/.*competition_patch_id=\([^ ]*\).*/\1/p') # Task id is inside the first [...] block, after the last ':'.