diff --git a/.claude/skills/development-loop/SKILL.md b/.claude/skills/development-loop/SKILL.md index be67437..b1bc639 100644 --- a/.claude/skills/development-loop/SKILL.md +++ b/.claude/skills/development-loop/SKILL.md @@ -1,6 +1,6 @@ --- name: development-loop -description: Red-green-refactor development loop for implementing Gateway API conformance tests. Use this skill when working on implementing new conformance tests for the multiway project. It guides the agent through selecting the next test to implement based on priority tiers, running the conformance suite, diagnosing failures, and implementing fixes. +description: Red-green-refactor development loop for implementing Gateway API conformance tests. Use this skill when working on implementing new conformance tests for the multiway project. It guides the agent through selecting the next unblocked Linear ticket, creating a Graphite-tracked branch, running the conformance suite in an isolated Kubernetes namespace, diagnosing failures, and implementing fixes. --- # Development Loop for Gateway API Conformance Implementation @@ -9,127 +9,158 @@ You are an expert in implementing Kubernetes Gateway API conformance tests. You ## Overview -This skill guides you through a development loop for implementing conformance tests one at a time. Each iteration of the loop: -1. Selects the highest-priority unimplemented test -2. Verifies the test is currently skipped -3. Enables the test and observes the failure -4. Diagnoses the root cause -5. Implements and verifies the fix -6. Documents the results +This skill guides you through a development loop for implementing conformance tests one ticket at a time. Each iteration: -**CRITICAL**: All conformance tests MUST be run **locally** using the `gateway-conformance-runner` skill's local testing workflow. Never run tests in-cluster during development. +1. Selects the next unblocked ticket from Linear +2. Creates a Graphite-tracked branch so Linear auto-transitions the ticket +3. Sets up an isolated namespace on the shared conformance cluster +4. Runs conformance tests and observes failures +5. Diagnoses root causes and implements fixes +6. Tears down the namespace and submits the PR -## Test Priority Tiers +## Prerequisites -Test cases have been prioritized into 7 tiers, stored in CSV files within this skill's directory: +- **Linear MCP**: The Linear MCP tools must be available for querying tickets and checking dependencies +- **Graphite CLI (`gt`)**: Must be installed for branch creation and PR stacking +- **Shared cluster**: The `mw-conformance` DigitalOcean Kubernetes cluster is used for all conformance runs. If it doesn't exist yet, `cluster-up.sh` will create it automatically via `cargo make do-create`. +- **Environment variables** (configured in `.envrc.local`): + - `GATEWAY_CONFORMANCE_SUITE` — path to the Gateway API repository clone + - `DOCKER_REGISTRY` — container registry URL (e.g., `ghcr.io/wack`) -| File | Priority | Description | -|------|----------|-------------| -| `test-tiers/tier-1-essential.csv` | Highest | Core functionality that must work | -| `test-tiers/tier-2-important-http.csv` | High | Important HTTP routing features | -| `test-tiers/tier-3-production.csv` | Medium-High | Production-ready features | -| `test-tiers/tier-4-advanced.csv` | Medium | Advanced routing capabilities | -| `test-tiers/tier-5-observability.csv` | Medium-Low | Observability features | -| `test-tiers/tier-6-validation.csv` | Low | Validation and edge cases | -| `test-tiers/tier-7-not-relevant.csv` | Lowest | Tests not relevant to this implementation | +## Development Loop Steps -Each CSV file has the following columns: -- `test_name`: The name of the conformance test -- `description`: A brief description of what the test validates -- `implemented`: Status - `false`, `in-progress`, or `true` +### Step 1: Select the Next Ticket from Linear -## Development Loop Steps +Query Linear for the next ticket to work on: -### Step 1: Select the Next Test +1. Use the Linear MCP `list_issues` tool to fetch issues in the **"MultiWay: API Gateway"** project with state **"Todo"** +2. For each candidate ticket, use `get_issue` with `includeRelations: true` to check its dependency graph +3. **Skip any ticket whose blockers are still open** — examine the `blockedBy` relations and reject tickets where any blocking issue has a status other than "Done" or "Canceled" +4. Select the first unblocked ticket (prefer lower issue numbers, as they represent foundational work that later tickets build upon) -Use the `pick-next.sh` helper script to select and enable the next test: +**IMPORTANT**: After selecting a ticket, clearly inform the user: +- The **ticket ID** (e.g., `MULTI-1101`) +- The **ticket title** (e.g., "Tier 1 — Core routing conformance (7 tests)") +- A brief summary of what the ticket covers -```bash -# See what test is next without enabling it -./pick-next.sh --show-next +**Example output to user:** +> The next ticket to work on is **MULTI-1101**: Tier 1 — Core routing conformance (7 tests). This covers the 7 most essential conformance tests including basic routing, path matching, weighted backends, and listener hostname matching. -# Enable the next test (removes t.Skip() and marks as in-progress) -./pick-next.sh -``` +If no unblocked "Todo" tickets exist, inform the user and stop. -The script will: -1. Scan tier CSV files in priority order (tier-1 first, tier-7 last) -2. Find the first test where `implemented` is `false` or `in-progress` -3. If `false`, enable the test by removing `t.Skip()` from the conformance suite -4. Update the CSV status to `in-progress` +### Step 2: Create a Branch and Start Work -**IMPORTANT**: After running `pick-next.sh`, you MUST inform the user which test was selected by clearly stating: -- The **test name** (e.g., `HTTPRouteSimpleSameNamespace`) -- The **test description** (e.g., "Basic HTTP routing from a route to a backend service in the same namespace") +Every Linear issue has a `gitBranchName` field (e.g., `robbie/multi-1101`). Use this for the branch name so that Linear's GitHub integration can automatically track the ticket's lifecycle. -This ensures the user understands what functionality is being implemented in this iteration. +1. **Create the branch with Graphite** so PRs can be stacked: + ```bash + gt create + ``` + This creates a new branch tracked by Graphite, branched from the current stack. -**Example output to user:** -> The next test to implement is **HTTPRouteSimpleSameNamespace**: Basic HTTP routing from a route to a backend service in the same namespace. This is the foundation of all routing functionality. +2. **Push the branch to the remote** so Linear detects it and auto-transitions the ticket to "In Progress": + ```bash + git push -u origin + ``` + +3. **Confirm the transition in Linear** — as a safety net, also update the ticket status via the Linear MCP: + ``` + save_issue(id: "MULTI-XXXX", state: "In Progress") + ``` + +### Step 3: Set Up the Conformance Namespace + +The project uses a single shared cluster (`mw-conformance`) instead of spinning up a new cluster for each ticket. Each ticket gets its own namespace for isolation, so multiple agents can run conformance suites in parallel without conflicting. + +The namespace name is the **lowercased ticket ID** (e.g., `multi-1101`). Kubernetes namespaces must be lowercase. + +1. **Switch to the conformance cluster context**: + ```bash + doctl kubernetes cluster kubeconfig save mw-conformance + ``` + +2. **Verify the cluster is accessible**: + ```bash + kubectl get nodes + ``` + If this fails, the cluster may not exist yet. The `cluster-up.sh` script handles this automatically — it will create the cluster if it's missing. -### Step 2: Verify Test is Currently Skipped +3. **Create a namespace for this ticket**: + ```bash + kubectl create namespace + ``` + If the namespace already exists (e.g., from a previous attempt), delete it first to ensure a clean slate: + ```bash + kubectl delete namespace --wait=true --ignore-not-found + kubectl create namespace + ``` -Before making any code changes, verify the current state: +4. **Install/update Gateway API CRDs** (idempotent, safe to run every time): + ```bash + cargo make gateway-api-install + ``` -1. Ensure the `GATEWAY_CONFORMANCE_SUITE` environment variable is set -2. Navigate to `$GATEWAY_CONFORMANCE_SUITE` -3. Use the `gateway-conformance-runner` skill to run the conformance suite locally -4. Verify: - - The selected test is currently **skipped** (not running) - - All other enabled tests are **passing** +### Step 4: Build, Deploy, and Run Conformance Tests -If other tests are failing, stop and address those failures first before enabling a new test. +Use the `gateway-conformance-runner` skill's `run-conformance.sh` script. It accepts `--namespace` to target your ticket's namespace and `--cluster-name` to specify the shared cluster. -### Step 3: Enable the Test and Observe Failure +**First run** (builds images, deploys, and runs tests): +```bash +.claude/skills/gateway-conformance-runner/run-conformance.sh \ + --cluster-name mw-conformance \ + --namespace +``` -1. Enable the test by removing it from the skip list or adding it to the enabled tests in the conformance configuration -2. Run the conformance suite again using `gateway-conformance-runner` -3. Observe and capture the test failure output -4. Document the specific failure message and any relevant stack traces +**Subsequent runs** (skip the build if code hasn't changed): +```bash +.claude/skills/gateway-conformance-runner/run-conformance.sh \ + --cluster-name mw-conformance \ + --namespace \ + --skip-build +``` -### Step 4: Handle Test Results +**Test-only re-runs** (controller is already deployed): +```bash +.claude/skills/gateway-conformance-runner/run-conformance.sh \ + --cluster-name mw-conformance \ + --namespace \ + --skip-build --skip-deploy +``` -**If the test passes immediately:** -- Update the CSV file to change `implemented` from `in-progress` to `true` -- Document this finding (the feature was already implemented) -- Return to Step 1 to select the next test +### Step 5: Handle Test Results -**If the test fails:** -- Proceed to Step 5 (Diagnosis) +**If the target tests pass immediately:** +- The feature was already implemented — document this finding +- Proceed to Step 8 (Clean Up and Report) -### Step 5: Diagnose the Failure +**If tests fail:** +- Proceed to Step 6 (Diagnosis) -#### 5a: Attempt to Create a Unit Test (Recommended) +### Step 6: Diagnose the Failure -Before diving into the implementation, try to recreate the conformance test as a purely functional unit test within this repository: +#### 6a: Create a Unit Test First (Recommended) -1. Study the conformance test implementation in `$GATEWAY_CONFORMANCE_SUITE/conformance` -2. Understand what scenario the test is validating -3. Create a unit test using this project's testing patterns: - - Use `snapshot` semantics for expected outputs - - Use `world state` semantics for modeling the reconciler - - Implement as a purely functional controller test +Before modifying production code, try to reproduce the failure as a fast, purely functional unit test. The multiway project follows a sans-I/O architecture, so most behavior can be tested without a cluster. -Having a local unit test provides: -- Faster iteration cycles -- Easier debugging -- Better test isolation -- Documentation of the expected behavior +1. Study the conformance test implementation in `$GATEWAY_CONFORMANCE_SUITE/conformance` +2. Create a unit test using this project's patterns: + - Use `WorldSnapshotBuilder` to set up cluster state + - Call pure reconciliation functions (`reconcile_gateway`, `reconcile_httproute`, etc.) + - Assert on the returned `ReconcileResult` -If you cannot successfully create a unit test, proceed to the next step. +Unit tests run in milliseconds (no cluster, no async), which dramatically speeds up the fix-verify cycle. -#### 5b: Investigate Root Cause +#### 6b: Investigate Root Cause 1. Analyze the failure message to identify the failing assertion -2. Trace through the code to understand the request flow: - - Control plane: How are resources being reconciled? - - Data plane: How are requests being routed? -3. Identify the specific code paths responsible for the failure -4. Document your findings +2. Trace through the code: + - **Control plane**: Is the ConfigMap being generated correctly? + - **Data plane**: Is the proxy routing requests correctly? +3. Identify the specific code paths responsible -#### 5c: File a Bug Report +#### 6c: File a Bug Report -Create a Markdown file in `./bug-reports/` documenting: +Create a Markdown file in `.claude/skills/development-loop/bug-reports/` documenting: ```markdown # Bug Report: [Test Name] @@ -151,157 +182,101 @@ Create a Markdown file in `./bug-reports/` documenting: [Your plan to address the issue] ``` -### Step 6: Implement the Fix +### Step 7: Implement and Verify the Fix + +1. Make the necessary code changes — keep them minimal and focused +2. Follow the project's coding conventions (see CLAUDE.md) +3. If you created a unit test in Step 6a, run it first for fast feedback: + ```bash + cargo nextest run + ``` +4. Run the full conformance suite again: + ```bash + .claude/skills/gateway-conformance-runner/run-conformance.sh \ + --cluster-name mw-conformance \ + --namespace \ + --skip-build + ``` +5. Verify: + - The previously failing tests now **pass** + - No other tests have **regressed** + +If verification fails, return to Step 6 to continue diagnosis. -1. Make the necessary code changes to fix the identified issue -2. Keep changes minimal and focused on the specific test -3. Follow the project's coding conventions and patterns +### Step 8: Clean Up and Report -### Step 7: Verify the Fix +Once all tests for the ticket pass: -1. If you created a unit test in Step 5a, run it first: +1. **Run formatting and linting** to ensure code quality: ```bash - cargo nextest run [test_name] + cargo make fmt + cargo make clippy-flow ``` -2. Run the full conformance suite using `gateway-conformance-runner` -3. Verify: - - The previously failing test now **passes** - - No other tests have regressed -If verification fails, return to Step 5 to continue diagnosis. +2. **Tear down the namespace** to free cluster resources and avoid conflicts: + ```bash + kubectl delete namespace --wait=true + ``` + This removes all resources (deployments, services, configmaps, etc.) created for this ticket. Always do this, even if the ticket isn't fully complete. -### Step 8: Document and Report +3. **Commit your changes and submit a PR** via Graphite: + ```bash + gt submit + ``` + When the PR is merged, Linear's GitHub integration will automatically transition the ticket to "Done". -Once the test passes: +4. **Check in about cluster teardown** — after the PR is submitted, ask the user whether they'd like to tear down the shared `mw-conformance` cluster to save on DigitalOcean costs. If the user says yes, run: + ```bash + .claude/skills/gateway-conformance-runner/cluster-down.sh --destroy-cluster + ``` + This destroys the DigitalOcean cluster **and** removes its context, cluster entry, and user entry from the local kubeconfig, so it no longer appears in `kubectl config get-contexts`. The cluster can be recreated automatically by `cluster-up.sh` on the next run. -1. Update the CSV file to change `implemented` from `in-progress` to `true` + If the user says no (or wants to keep running more tickets), the cluster stays up and you can continue to Step 6. -2. Create a summary report with the following format: +5. **Create a summary report**: ```markdown -## Test Completed: [Test Name] +## Ticket Completed: [MULTI-XXXX] [Ticket Title] ### Summary [Brief description of what was implemented] ### Changes Made - -**Before:** -[Code or behavior before the fix] - -**After:** -[Code or behavior after the fix] - -### Files Modified - `path/to/file1.rs`: [description of changes] - `path/to/file2.rs`: [description of changes] -### Unit Test Added +### Unit Tests Added [Yes/No - if yes, describe the test] ### Lessons Learned -[Any insights that might help with future tests] +[Any insights that might help with future tickets] ``` -3. Return to Step 1 to continue with the next test - -## Running Conformance Tests Locally - -Always use the `gateway-conformance-runner` skill for running conformance tests. The local testing workflow provides: -- Faster iteration cycles -- Real-time output for debugging -- Direct access to test logs -- Ability to run individual tests - -Key commands: -```bash -# Verify environment -echo $GATEWAY_CONFORMANCE_SUITE - -# Run conformance tests locally -cd $GATEWAY_CONFORMANCE_SUITE && make conformance -``` +6. Return to Step 1 to continue with the next ticket. ## Best Practices -1. **One test at a time**: Focus on a single test per iteration -2. **Verify first**: Always confirm the test is skipped before enabling -3. **Minimal changes**: Make the smallest change needed to pass the test -4. **Document everything**: Keep thorough records in bug reports and summaries -5. **Unit tests preferred**: Local unit tests make debugging much faster -6. **No regressions**: Ensure all previously passing tests continue to pass +1. **One ticket at a time**: Focus on a single ticket per iteration +2. **Namespace isolation**: Always work in a ticket-specific namespace to avoid conflicts with other agents +3. **Minimal changes**: Make the smallest change needed to pass the tests +4. **Unit tests preferred**: Local unit tests (milliseconds) are far faster than conformance runs (minutes) +5. **No regressions**: Ensure all previously passing tests continue to pass +6. **Always clean up**: Delete the namespace when done, even if you're abandoning the ticket +7. **Document everything**: Bug reports and summaries help future iterations ## Error Recovery -If you encounter issues: -- **Wrong kubectl context**: Stop immediately, switch to the correct context -- **Conformance suite not found**: Verify `GATEWAY_CONFORMANCE_SUITE` is set correctly -- **Multiple tests failing**: Address failing tests before enabling new ones -- **Stuck on a test**: Document findings, mark as `in-progress`, and consider moving to the next test with a note - -## Helper Script: pick-next.sh - -A helper script is provided to automate common development loop tasks: - -```bash -# Location -.claude/skills/development-loop/pick-next.sh -``` - -### Script Features - -The `pick-next.sh` script automates: -1. **CSV Concatenation**: Combines all tier files in priority order (tier-1 first) -2. **Next Test Selection**: Finds the first test with status `in-progress` or `false` -3. **Test Enabling**: Uses AST-Grep to remove `t.Skip()` calls from conformance tests - -### Usage - -```bash -# Show the next test to work on -./pick-next.sh --show-next - -# Enable the next test (removes t.Skip() and updates CSV to in-progress) -./pick-next.sh - -# Preview what would be done without making changes -./pick-next.sh --dry-run - -# List all tests in priority order with their status -./pick-next.sh --list-all - -# Show help -./pick-next.sh --help -``` - -### Requirements - -- **GATEWAY_CONFORMANCE_SUITE**: Environment variable pointing to the Gateway API repository clone -- **ast-grep** (optional): The script will install it via cargo if not available, or fall back to sed - -### Example Workflow - -```bash -# 1. See what test to work on next -./pick-next.sh --show-next - -# 2. Enable the test (removes t.Skip() and marks as in-progress) -./pick-next.sh - -# 3. Run conformance tests to see the failure -cd $GATEWAY_CONFORMANCE_SUITE && make conformance - -# 4. Implement the fix in the multiway codebase - -# 5. Verify the fix passes -cd $GATEWAY_CONFORMANCE_SUITE && make conformance - -# 6. Manually update the CSV to mark as 'true' when complete -``` +- **Wrong kubectl context**: Run `doctl kubernetes cluster kubeconfig save mw-conformance` +- **Cluster not found**: The `cluster-up.sh` script will auto-create the cluster via `cargo make do-create`. If that fails, check DigitalOcean authentication (`doctl auth init`). +- **Namespace conflicts**: If a namespace already exists from a previous attempt, delete it first: + ```bash + kubectl delete namespace --wait=true --ignore-not-found + ``` +- **Stuck on a ticket**: Document findings in a bug report, commit work-in-progress, tear down the namespace, and move to the next unblocked ticket +- **No unblocked tickets**: Inform the user — all remaining "Todo" tickets are blocked by open dependencies +- **Build failures**: Run `cargo check` first to catch compilation errors before building Docker images ## Files and Directories -- `./pick-next.sh`: Helper script for development loop automation -- `./test-tiers/*.csv`: Test priority lists and implementation status - `./bug-reports/`: Diagnostic reports for failing tests - `$GATEWAY_CONFORMANCE_SUITE/conformance`: The official conformance test suite diff --git a/.claude/skills/development-loop/pick-next.sh b/.claude/skills/development-loop/pick-next.sh deleted file mode 100755 index 575bb72..0000000 --- a/.claude/skills/development-loop/pick-next.sh +++ /dev/null @@ -1,585 +0,0 @@ -#!/usr/bin/env bash -# -# pick-next.sh -# -# Development loop helper script for implementing Gateway API conformance tests. -# This script: -# 1. Concatenates all tier CSV files in priority order (1 = highest) -# 2. Finds the first test with status "in-progress" or "false" -# 3. If status is "false", enables the test by removing t.Skip() from the conformance suite -# -# Usage: -# ./pick-next.sh [OPTIONS] -# -# Options: -# --dry-run Print what would be done without making changes -# --show-next Show the next test to work on without enabling it -# --list-all List all tests in priority order with their status -# --help Show this help message -# -# Environment Variables: -# GATEWAY_CONFORMANCE_SUITE Path to the Gateway API repository root (required for enabling tests) -# - -set -euo pipefail - -# ============================================================================= -# CONFIGURATION -# ============================================================================= - -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -TIERS_DIR="${SCRIPT_DIR}/test-tiers" - -# Tier files in priority order (1 = highest priority) -TIER_FILES=( - "tier-1-essential.csv" - "tier-2-important-http.csv" - "tier-3-production.csv" - "tier-4-advanced.csv" - "tier-5-observability.csv" - "tier-6-validation.csv" - "tier-7-not-relevant.csv" -) - -# Colors for output -COLOR_RED='\033[0;31m' -COLOR_GREEN='\033[0;32m' -COLOR_YELLOW='\033[0;33m' -COLOR_BLUE='\033[0;34m' -COLOR_RESET='\033[0m' - -# ============================================================================= -# GLOBAL STATE -# ============================================================================= - -DRY_RUN=false -SHOW_NEXT=false -LIST_ALL=false - -# ============================================================================= -# LOGGING FUNCTIONS -# ============================================================================= - -info() { - echo -e "${COLOR_BLUE}[INFO]${COLOR_RESET} $*" -} - -success() { - echo -e "${COLOR_GREEN}[SUCCESS]${COLOR_RESET} $*" -} - -warn() { - echo -e "${COLOR_YELLOW}[WARN]${COLOR_RESET} $*" -} - -error() { - echo -e "${COLOR_RED}[ERROR]${COLOR_RESET} $*" >&2 -} - -error_exit() { - error "$*" - exit 1 -} - -# ============================================================================= -# HELP -# ============================================================================= - -show_help() { - cat << 'EOF' -Usage: pick-next.sh [OPTIONS] - -Development loop helper for Gateway API conformance test implementation. - -This script manages the test implementation workflow by: - 1. Concatenating all tier CSV files in priority order (tier-1 = highest) - 2. Finding the first test with status "in-progress" or "false" - 3. If status is "false", enabling the test by removing t.Skip() in the conformance suite - -Options: - --dry-run Print what would be done without making changes - --show-next Show the next test to work on without enabling it - --list-all List all tests in priority order with their status - --help Show this help message - -Environment Variables: - GATEWAY_CONFORMANCE_SUITE Path to the Gateway API repository root (required for enabling tests) - -Examples: - # Show the next test to implement - ./pick-next.sh --show-next - - # Enable the next test (remove t.Skip()) - ./pick-next.sh - - # See what would be done without making changes - ./pick-next.sh --dry-run - - # List all tests with their current status - ./pick-next.sh --list-all -EOF - exit 0 -} - -# ============================================================================= -# ARGUMENT PARSING -# ============================================================================= - -parse_arguments() { - while [[ $# -gt 0 ]]; do - case "$1" in - --dry-run) - DRY_RUN=true - shift - ;; - --show-next) - SHOW_NEXT=true - shift - ;; - --list-all) - LIST_ALL=true - shift - ;; - --help|-h) - show_help - ;; - *) - error_exit "Unknown option: $1. Use --help for usage information." - ;; - esac - done -} - -# ============================================================================= -# CSV PROCESSING FUNCTIONS -# ============================================================================= - -####################################### -# Concatenates all tier CSV files in priority order. -# Strips the header row from all files except the first. -# Outputs to stdout. -####################################### -concatenate_tier_files() { - local first_file=true - - for tier_file in "${TIER_FILES[@]}"; do - local file_path="${TIERS_DIR}/${tier_file}" - - if [[ ! -f "${file_path}" ]]; then - warn "Tier file not found: ${file_path}" - continue - fi - - if [[ "${first_file}" == true ]]; then - # Include header from first file - cat "${file_path}" - first_file=false - else - # Skip header (first line) from subsequent files - tail -n +2 "${file_path}" - fi - done -} - -####################################### -# Finds the first test with status "in-progress" or "false". -# Returns: test_name,description,status (CSV format) -# Exit code: 0 if found, 1 if not found -####################################### -find_next_test() { - local csv_data - csv_data=$(concatenate_tier_files) - - # Skip header and find first row with in-progress or false - echo "${csv_data}" | tail -n +2 | while IFS=',' read -r test_name description implemented; do - # Trim whitespace from implemented status - implemented=$(echo "${implemented}" | tr -d '[:space:]') - - if [[ "${implemented}" == "in-progress" ]] || [[ "${implemented}" == "false" ]]; then - echo "${test_name},${description},${implemented}" - return 0 - fi - done -} - -####################################### -# Lists all tests with their status in priority order. -####################################### -list_all_tests() { - local csv_data - csv_data=$(concatenate_tier_files) - - echo "" - echo "All tests in priority order:" - echo "============================" - echo "" - printf "%-40s %-15s %s\n" "TEST NAME" "STATUS" "DESCRIPTION" - printf "%-40s %-15s %s\n" "---------" "------" "-----------" - - # Skip header and print all rows - echo "${csv_data}" | tail -n +2 | while IFS=',' read -r test_name description implemented; do - implemented=$(echo "${implemented}" | tr -d '[:space:]') - - # Color code the status - local status_colored - case "${implemented}" in - true) - status_colored="${COLOR_GREEN}${implemented}${COLOR_RESET}" - ;; - in-progress) - status_colored="${COLOR_YELLOW}${implemented}${COLOR_RESET}" - ;; - false) - status_colored="${COLOR_RED}${implemented}${COLOR_RESET}" - ;; - *) - status_colored="${implemented}" - ;; - esac - - # Truncate description if too long - if [[ ${#description} -gt 50 ]]; then - description="${description:0:47}..." - fi - - printf "%-40s %-15b %s\n" "${test_name}" "${status_colored}" "${description}" - done - - echo "" -} - -# ============================================================================= -# PORTABLE UTILITIES -# ============================================================================= - -####################################### -# Portable sed -i that works on both macOS and Linux. -# On macOS, sed -i requires an argument; on Linux it doesn't. -# Arguments: -# $1 - sed expression -# $2 - file to edit -####################################### -sed_inplace() { - local expression="$1" - local file="$2" - - if [[ "$(uname)" == "Darwin" ]]; then - sed -i '' "${expression}" "${file}" - else - sed -i "${expression}" "${file}" - fi -} - -# ============================================================================= -# AST-GREP FUNCTIONS -# ============================================================================= - -####################################### -# Checks if ast-grep is available. -# Returns: 0 if available, 1 if not -####################################### -check_ast_grep() { - if command -v ast-grep &>/dev/null; then - return 0 - fi - - # Check if it's available via cargo - if [[ -f "${HOME}/.cargo/bin/ast-grep" ]]; then - export PATH="${HOME}/.cargo/bin:${PATH}" - return 0 - fi - - return 1 -} - -####################################### -# Installs ast-grep via cargo if not available. -####################################### -install_ast_grep() { - info "Installing ast-grep via cargo..." - - if ! command -v cargo &>/dev/null; then - error_exit "cargo is required to install ast-grep. Please install Rust first." - fi - - if ! cargo install ast-grep; then - error_exit "Failed to install ast-grep" - fi - - export PATH="${HOME}/.cargo/bin:${PATH}" - success "ast-grep installed successfully" -} - -####################################### -# Finds the test file containing the specified test function. -# Arguments: -# $1 - Test name (e.g., HTTPRouteSimpleSameNamespace) -# Returns: Path to the test file containing the test -####################################### -find_test_file() { - local test_name="$1" - local conformance_dir="${GATEWAY_CONFORMANCE_SUITE}/conformance" - - if [[ ! -d "${conformance_dir}" ]]; then - error_exit "Conformance directory not found: ${conformance_dir}" - fi - - # Search for the test name in Go files - local test_file - test_file=$(grep -rl "\"${test_name}\"" "${conformance_dir}" --include="*.go" 2>/dev/null | head -1) - - if [[ -z "${test_file}" ]]; then - # Try searching for the test function directly - test_file=$(grep -rl "func.*${test_name}" "${conformance_dir}" --include="*.go" 2>/dev/null | head -1) - fi - - if [[ -z "${test_file}" ]]; then - return 1 - fi - - echo "${test_file}" -} - -####################################### -# Removes t.Skip() call from a test using ast-grep. -# Arguments: -# $1 - Test name -# $2 - Test file path -####################################### -remove_skip_with_ast_grep() { - local test_name="$1" - local test_file="$2" - - info "Using ast-grep to remove t.Skip() for test: ${test_name}" - - if [[ "${DRY_RUN}" == true ]]; then - echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} Would remove t.Skip() from: ${test_file}" - return 0 - fi - - # Create a temporary file for the ast-grep rule - # Use a portable approach that works on both macOS and Linux - local rule_file - rule_file="${TMPDIR:-/tmp}/ast-grep-rule-$$.yaml" - - # Write the ast-grep rule to find and remove t.Skip() calls - # This pattern matches t.Skip("reason") statements - cat > "${rule_file}" << 'RULE' -id: remove-t-skip -language: go -rule: - any: - - pattern: t.Skip($$$) - - pattern: t.Skipf($$$) - - pattern: t.SkipNow() -fix: "" -RULE - - # Run ast-grep to remove t.Skip() calls - # We use --rewrite mode to apply the fix - if ! ast-grep scan --rule "${rule_file}" --update-all "${test_file}" 2>/dev/null; then - # If ast-grep fails, fall back to sed - warn "ast-grep failed, falling back to sed-based removal" - remove_skip_with_sed "${test_name}" "${test_file}" - fi - - rm -f "${rule_file}" -} - -####################################### -# Fallback: Removes t.Skip() call using sed. -# Arguments: -# $1 - Test name -# $2 - Test file path -####################################### -remove_skip_with_sed() { - local test_name="$1" - local test_file="$2" - - info "Using sed to remove t.Skip() for test: ${test_name}" - - if [[ "${DRY_RUN}" == true ]]; then - echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} Would use sed to remove t.Skip() from: ${test_file}" - return 0 - fi - - # Remove lines containing t.Skip, t.Skipf, or t.SkipNow - # This is a simple approach - ast-grep is more precise - sed_inplace '/[[:space:]]*t\.Skip\(f\?\|Now\)(/d' "${test_file}" -} - -####################################### -# Enables a test by removing its t.Skip() call. -# Arguments: -# $1 - Test name -####################################### -enable_test() { - local test_name="$1" - - info "Enabling test: ${test_name}" - - # Verify GATEWAY_CONFORMANCE_SUITE is set - if [[ -z "${GATEWAY_CONFORMANCE_SUITE:-}" ]]; then - error_exit "GATEWAY_CONFORMANCE_SUITE environment variable is not set. - -Please set it to the path of your Gateway API repository clone: - export GATEWAY_CONFORMANCE_SUITE=/path/to/gateway-api" - fi - - # Find the test file - local test_file - if ! test_file=$(find_test_file "${test_name}"); then - error_exit "Could not find test file for: ${test_name} - -The test may not exist in the conformance suite, or it may use a different name. -Please check the conformance suite at: ${GATEWAY_CONFORMANCE_SUITE}/conformance" - fi - - info "Found test in file: ${test_file}" - - # Check if ast-grep is available - if ! check_ast_grep; then - warn "ast-grep not found, attempting to install..." - install_ast_grep - fi - - # Remove t.Skip() using ast-grep - if check_ast_grep; then - remove_skip_with_ast_grep "${test_name}" "${test_file}" - else - warn "ast-grep not available, using sed fallback" - remove_skip_with_sed "${test_name}" "${test_file}" - fi - - success "Test enabled: ${test_name}" - info "File modified: ${test_file}" -} - -# ============================================================================= -# CSV UPDATE FUNCTIONS -# ============================================================================= - -####################################### -# Updates a test's status in its tier CSV file. -# Arguments: -# $1 - Test name -# $2 - New status (in-progress, true, false) -####################################### -update_test_status() { - local test_name="$1" - local new_status="$2" - - info "Updating status of ${test_name} to: ${new_status}" - - if [[ "${DRY_RUN}" == true ]]; then - echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} Would update ${test_name} status to: ${new_status}" - return 0 - fi - - # Find which tier file contains this test - for tier_file in "${TIER_FILES[@]}"; do - local file_path="${TIERS_DIR}/${tier_file}" - - if [[ ! -f "${file_path}" ]]; then - continue - fi - - if grep -q "^${test_name}," "${file_path}"; then - # Update the status in place - sed_inplace "s/^\(${test_name},.*,\)[^,]*$/\1${new_status}/" "${file_path}" - success "Updated ${test_name} in ${tier_file}" - return 0 - fi - done - - error "Test not found in any tier file: ${test_name}" - return 1 -} - -# ============================================================================= -# MAIN WORKFLOW -# ============================================================================= - -####################################### -# Main development loop workflow. -####################################### -main() { - parse_arguments "$@" - - # Handle --list-all option - if [[ "${LIST_ALL}" == true ]]; then - list_all_tests - exit 0 - fi - - echo "" - info "===========================================" - info "Development Loop Helper" - info "===========================================" - echo "" - - # Find the next test to work on - local next_test - next_test=$(find_next_test) - - if [[ -z "${next_test}" ]]; then - success "All tests are implemented! No more tests to work on." - exit 0 - fi - - # Parse the test info - local test_name description status - IFS=',' read -r test_name description status <<< "${next_test}" - - echo "" - info "Next test to work on:" - echo "" - echo " Test Name: ${test_name}" - echo " Description: ${description}" - echo " Status: ${status}" - echo "" - - # Handle --show-next option - if [[ "${SHOW_NEXT}" == true ]]; then - if [[ "${status}" == "false" ]]; then - info "To enable this test, run: ./pick-next.sh" - elif [[ "${status}" == "in-progress" ]]; then - info "This test is already in-progress. Continue working on it." - fi - exit 0 - fi - - # If status is "in-progress", just report it - if [[ "${status}" == "in-progress" ]]; then - warn "Test '${test_name}' is already in-progress." - info "Continue working on this test, or manually update its status to 'true' when complete." - exit 0 - fi - - # Status is "false" - enable the test - info "Enabling test: ${test_name}" - echo "" - - # Enable the test (remove t.Skip()) - enable_test "${test_name}" - - # Update the CSV status to "in-progress" - update_test_status "${test_name}" "in-progress" - - echo "" - success "===========================================" - success "Test enabled and marked as in-progress" - success "===========================================" - echo "" - echo " Test Name: ${test_name}" - echo " Description: ${description}" - echo "" - info "Next steps:" - echo " 1. Run the conformance tests to see the failure" - echo " 2. Diagnose and implement the fix" - echo " 3. Verify the test passes" - echo " 4. Update the CSV status to 'true' when complete" - echo "" -} - -main "$@" diff --git a/.claude/skills/development-loop/test-tiers/tier-1-essential.csv b/.claude/skills/development-loop/test-tiers/tier-1-essential.csv deleted file mode 100644 index 0184a40..0000000 --- a/.claude/skills/development-loop/test-tiers/tier-1-essential.csv +++ /dev/null @@ -1,8 +0,0 @@ -test_name,description,implemented -HTTPRouteSimpleSameNamespace,Basic HTTP routing from a route to a backend service in the same namespace. Foundation of all routing.,true -HTTPRouteMatching,Path and header matching for routing requests to different backends based on request criteria.,false -HTTPRouteExactPathMatching,Exact path matching where /foo matches only /foo and not /foo/bar.,false -HTTPRouteWeight,Traffic distribution across multiple backends based on specified weights for load balancing.,false -GatewayWithAttachedRoutes,Core Gateway-Route attachment model verifying routes attach correctly and status is tracked.,false -HTTPRouteListenerHostnameMatching,Multiple listeners with different hostnames routing to different backends.,false -HTTPRouteListenerPortMatching,HTTP listeners on different ports with port-based routing.,false diff --git a/.claude/skills/development-loop/test-tiers/tier-2-important-http.csv b/.claude/skills/development-loop/test-tiers/tier-2-important-http.csv deleted file mode 100644 index c915d6c..0000000 --- a/.claude/skills/development-loop/test-tiers/tier-2-important-http.csv +++ /dev/null @@ -1,11 +0,0 @@ -test_name,description,implemented -HTTPRouteHeaderMatching,Header-based routing rules enabling request filtering and routing based on HTTP headers.,false -HTTPRouteMethodMatching,HTTP method-based routing (GET/POST/PUT/DELETE etc) for REST API routing.,false -HTTPRouteQueryParamMatching,Query parameter-based routing for feature flags and conditional routing.,false -HTTPRouteRequestHeaderModifier,Adding/removing/replacing request headers before forwarding to backends.,false -HTTPRouteResponseHeaderModifier,Response header modification for security headers and CORS.,false -HTTPRouteRewritePath,Path rewriting and prefix stripping for backend URL compatibility.,false -HTTPRouteRewriteHost,Host header rewriting for multi-domain backend support.,false -HTTPRouteHostnameIntersection,Hostname matching with wildcard and specific hostname handling and precedence.,false -HTTPRouteMatchingAcrossRoutes,Routing rules across multiple HTTPRoutes on same gateway verifying rule precedence.,false -HTTPRoutePathMatchOrder,Correct matching order for path-based rules ensuring predictable routing behavior.,false diff --git a/.claude/skills/development-loop/test-tiers/tier-3-production.csv b/.claude/skills/development-loop/test-tiers/tier-3-production.csv deleted file mode 100644 index 855bab7..0000000 --- a/.claude/skills/development-loop/test-tiers/tier-3-production.csv +++ /dev/null @@ -1,11 +0,0 @@ -test_name,description,implemented -HTTPRouteHTTPSListener,HTTPS listener support with TLS certificate management and termination.,false -HTTPRouteRedirectScheme,HTTP to HTTPS redirect for standard security enforcement.,false -HTTPRouteRedirectPath,Path-based redirects for URL migration and restructuring.,false -HTTPRouteRedirectPort,Port-based redirects for traffic management.,false -HTTPRouteRedirectHostAndStatus,Host redirect with HTTP status code control (301/302/etc).,false -HTTPRouteRedirectPortAndScheme,Combined port and scheme redirects in a single rule.,false -HTTPRouteTimeoutRequest,Request timeout handling to prevent hung connections.,false -HTTPRouteTimeoutBackendRequest,Backend connection timeout to handle slow backends.,false -HTTPRouteCrossNamespace,Cross-namespace route attachment with ReferenceGrant for namespace isolation.,false -GatewayWithAttachedRoutesWithPort8080,Gateway with routes on non-standard ports (8080).,false diff --git a/.claude/skills/development-loop/test-tiers/tier-4-advanced.csv b/.claude/skills/development-loop/test-tiers/tier-4-advanced.csv deleted file mode 100644 index 4842fc7..0000000 --- a/.claude/skills/development-loop/test-tiers/tier-4-advanced.csv +++ /dev/null @@ -1,13 +0,0 @@ -test_name,description,implemented -HTTPRouteRequestMirror,Request mirroring/shadowing to additional backends for canary testing.,false -HTTPRouteRequestMultipleMirrors,Multiple request mirrors to several backends simultaneously.,false -HTTPRouteRequestPercentageMirror,Percentage-based traffic mirroring for gradual testing rollout.,false -HTTPRouteBackendRequestHeaderModifier,Per-backend header modification separate from route-level modification.,false -HTTPRouteRequestHeaderModifierBackendWeights,Header modification combined with weighted backend distribution.,false -HTTPRouteCORSAllowCredentialsBehavior,CORS credential handling for web applications with authentication.,false -HTTPRouteNamedRule,Named HTTPRoute rules for better observability and metrics reference.,false -HTTPRouteServiceTypes,Support for various Kubernetes service types (headless/manual endpoint slices).,false -GatewayModifyListeners,Dynamic listener modification and status updates at runtime.,false -GatewayHTTPListenerIsolation,Listener isolation ensuring requests don't cross listener boundaries.,false -GatewayStaticAddresses,Gateway static IP address assignment and management.,false -GatewayOptionalAddressValue,Optional gateway address handling when address is not specified.,false diff --git a/.claude/skills/development-loop/test-tiers/tier-5-observability.csv b/.claude/skills/development-loop/test-tiers/tier-5-observability.csv deleted file mode 100644 index ad5ff4d..0000000 --- a/.claude/skills/development-loop/test-tiers/tier-5-observability.csv +++ /dev/null @@ -1,5 +0,0 @@ -test_name,description,implemented -HTTPRouteObservedGenerationBump,Route status condition tracking for configuration change detection.,false -GatewayObservedGenerationBump,Gateway status generation tracking for state synchronization.,false -GatewayClassObservedGenerationBump,GatewayClass status tracking for control plane consistency verification.,false -GatewayInfrastructure,Infrastructure metadata propagation to provisioned resources.,false diff --git a/.claude/skills/development-loop/test-tiers/tier-6-validation.csv b/.claude/skills/development-loop/test-tiers/tier-6-validation.csv deleted file mode 100644 index 54e13c9..0000000 --- a/.claude/skills/development-loop/test-tiers/tier-6-validation.csv +++ /dev/null @@ -1,18 +0,0 @@ -test_name,description,implemented -HTTPRouteInvalidBackendRefUnknownKind,Verification that unknown backend reference kinds are properly rejected.,false -HTTPRouteInvalidNonExistentBackendRef,Non-existent backend reference rejection with proper status.,false -HTTPRouteInvalidCrossNamespaceBackendRef,Cross-namespace backend ref without ReferenceGrant permission rejection.,false -HTTPRouteInvalidCrossNamespaceParentRef,Cross-namespace parent ref validation and rejection.,false -HTTPRouteInvalidParentRefNotMatchingListenerPort,Port matching validation when parent ref port doesn't match listener.,false -HTTPRouteInvalidParentRefNotMatchingSectionName,Section name matching validation for parent references.,false -HTTPRouteInvalidParentRefSectionNameNotMatchingPort,Combined port/section name validation for parent refs.,false -HTTPRouteInvalidReferenceGrant,Invalid ReferenceGrant handling and rejection.,false -HTTPRoutePartiallyInvalidViaInvalidReferenceGrant,Partial route acceptance when some refs are invalid.,false -HTTPRouteDisallowedKind,Route kind rejection when listener doesn't allow it.,false -HTTPRouteReferenceGrant,Valid ReferenceGrant acceptance for cross-namespace access.,false -GatewaySecretInvalidReferenceGrant,Invalid secret ReferenceGrant handling for TLS secrets.,false -GatewaySecretMissingReferenceGrant,Missing ReferenceGrant rejection for cross-namespace secrets.,false -GatewaySecretReferenceGrantAllInNamespace,ReferenceGrant with all-in-namespace permissions for secrets.,false -GatewaySecretReferenceGrantSpecific,Specific ReferenceGrant for particular secrets only.,false -GatewayInvalidRouteKind,Invalid route kind rejection at Gateway level.,false -GatewayInvalidTLSConfiguration,TLS configuration validation and error reporting.,false diff --git a/.claude/skills/development-loop/test-tiers/tier-7-not-relevant.csv b/.claude/skills/development-loop/test-tiers/tier-7-not-relevant.csv deleted file mode 100644 index d857166..0000000 --- a/.claude/skills/development-loop/test-tiers/tier-7-not-relevant.csv +++ /dev/null @@ -1,17 +0,0 @@ -test_name,description,implemented -GRPCExactMethodMatching,gRPC method-based routing with exact service/method matching.,false -GRPCRouteHeaderMatching,gRPC route header matching for gRPC metadata routing.,false -GRPCRouteListenerHostnameMatching,gRPC hostname matching for multi-tenant gRPC services.,false -GRPCRouteNamedRule,Named gRPC route rules for observability.,false -GRPCRouteWeight,gRPC traffic weighting for load balancing gRPC services.,false -TLSRouteSimpleSameNamespace,TLS passthrough routing based on SNI without termination.,false -TLSRouteInvalidReferenceGrant,TLS route reference grant validation.,false -UDPRoute,UDP protocol routing for non-HTTP traffic.,false -HTTPRouteBackendProtocolWebSocket,WebSocket protocol support and upgrade handling.,false -HTTPRouteBackendProtocolH2C,HTTP/2 Cleartext (h2c) backend protocol support.,false -BackendTLSPolicy,TLS configuration for gateway-to-backend connections (mTLS).,false -BackendTLSPolicyConflictResolution,Handling conflicting TLS policies on same backend.,false -BackendTLSPolicyInvalidCACertificateRef,Invalid CA certificate reference handling.,false -BackendTLSPolicyInvalidKind,Invalid backend reference kind rejection in TLS policy.,false -BackendTLSPolicyObservedGenerationBump,TLS policy status generation tracking.,false -BackendTLSPolicySANValidation,SubjectAltName validation in backend TLS certificates.,false diff --git a/.claude/skills/gateway-api/SKILL.md b/.claude/skills/gateway-api/SKILL.md deleted file mode 100644 index 75ca934..0000000 --- a/.claude/skills/gateway-api/SKILL.md +++ /dev/null @@ -1,33 +0,0 @@ ---- -name: gateway-api -description: contains reference documentation for implementing the Kubernetes Gateway API ---- - -# Kubernetes Gateway API Reference - -## Description - -Complete reference documentation for the Kubernetes Gateway API specification. This skill provides the full specification of all API types, objects, and conformance requirements for Gateway implementations. - -** All information about the spceification is stored in `reference.md`** - -## Skills - -- Complete API type definitions and object specifications -- Gateway conformance requirements and test profiles -- Detailed field specifications and validation rules -- Resource relationships and dependencies -- Standard and extended features matrix - -## Instructions - -This skill provides authoritative reference documentation for the Kubernetes Gateway API specification. Use when you need: - -- Exact API type definitions for Gateway, GatewayClass, HTTPRoute, TCPRoute, TLSRoute, UDPRoute, and ReferenceGrant -- Complete field specifications, types, and validation requirements -- Gateway conformance requirements and implementation standards -- Resource status conditions and their meanings -- Cross-reference relationships between API objects -- Standard vs extended feature classifications - -All information is based on the official Kubernetes Gateway API specification and covers the complete set of requirements for conformant Gateway implementations. diff --git a/.claude/skills/gateway-conformance-runner/SKILL.md b/.claude/skills/gateway-conformance-runner/SKILL.md index 14d1ae4..c61cb69 100644 --- a/.claude/skills/gateway-conformance-runner/SKILL.md +++ b/.claude/skills/gateway-conformance-runner/SKILL.md @@ -1,6 +1,6 @@ --- name: gateway-conformance-runner -description: Use this skill when you need to run the Gateway API conformance test suite for the multiway project. It includes setting up a DigitalOcean Kubernetes cluster, building and deploying the gateway controller, running the official conformance tests, and analyzing the results. The skill handles the complete workflow from cluster creation to test execution and log retrieval. +description: Use this skill when you need to run the Gateway API conformance test suite for the multiway project. It handles building and deploying the gateway controller to a shared DigitalOcean cluster, running the official conformance tests in an isolated namespace, and analyzing the results. Use this skill whenever the user mentions conformance tests, test suite, or wants to verify Gateway API compliance. hooks: pre: .claude/skills/gateway-conformance-runner/cluster-up.sh post: .claude/skills/gateway-conformance-runner/cluster-down.sh @@ -8,42 +8,48 @@ hooks: You are responsible for running the Gateway API conformance test suite for the multiway project and reporting the results. -# Cluster Lifecycle +# Cluster and Namespace Model -This skill includes automatic cluster management through hooks: +The conformance suite runs on a **shared, long-lived** DigitalOcean cluster called `mw-conformance`. Each test run gets its own **isolated namespace** so that multiple agents or branches can run conformance tests in parallel without conflicting. -- **Startup Hook** (`cluster-up.sh`): Runs before the skill starts. Creates the DigitalOcean Kubernetes cluster if it doesn't exist, or clears the namespace if it does. Ensures kubectl context is properly configured. -- **Shutdown Hook** (`cluster-down.sh`): Runs after the skill completes. Destroys the DigitalOcean cluster and cleans up kubectl context to avoid unnecessary costs. +## Automatic Lifecycle via Hooks -## Cluster Naming +- **Startup Hook** (`cluster-up.sh`): Runs before the skill starts. Verifies the shared cluster is accessible (creating it automatically if it doesn't exist) and creates a fresh namespace derived from the current git branch (e.g., branch `robbie/multi-1101` creates namespace `multi-1101`). +- **Shutdown Hook** (`cluster-down.sh`): Runs after the skill completes. Deletes the namespace and all its resources. By default the shared cluster is preserved; use `--destroy-cluster` to also tear it down and clean up kubeconfig. -The cluster name defaults to a sanitized version of the current git branch, prefixed with `mw-`. For example: -- Branch `main` → cluster `mw-main` -- Branch `feature/my-test` → cluster `mw-feature-my-test` -- Branch `claude/migrate-kind-to-digitalocean-Ytxrr` → cluster `mw-claude-migrate-kind-to-digitalocean-ytxrr` +## Namespace Naming -This allows multiple developers or branches to have isolated clusters without conflicts. +The namespace defaults to the last path component of the current git branch, lowercased. For example: +- Branch `robbie/multi-1101` → namespace `multi-1101` +- Branch `feature/my-test` → namespace `my-test` +- Branch `main` → namespace `main` -You can override the cluster name with `--cluster-name` or the `DO_CLUSTER_NAME` environment variable. +You can override the namespace with `--namespace` or the `CONFORMANCE_NAMESPACE` environment variable. -You can also run these scripts manually: +You can also run the hook scripts manually: ```bash -# Start or prepare the cluster for current branch +# Set up namespace for current branch .claude/skills/gateway-conformance-runner/cluster-up.sh -# Use a specific cluster name -.claude/skills/gateway-conformance-runner/cluster-up.sh --cluster-name my-cluster +# Use a specific namespace (e.g., for a Linear ticket) +.claude/skills/gateway-conformance-runner/cluster-up.sh --namespace multi-1101 -# Destroy the cluster when done +# Tear down the namespace when done .claude/skills/gateway-conformance-runner/cluster-down.sh + +# Tear down a specific namespace +.claude/skills/gateway-conformance-runner/cluster-down.sh --namespace multi-1101 + +# Tear down namespace AND destroy the cluster (saves DigitalOcean costs) +.claude/skills/gateway-conformance-runner/cluster-down.sh --destroy-cluster ``` # Running Conformance Tests Use the automated script located at `.claude/skills/gateway-conformance-runner/run-conformance.sh` to run the conformance tests. -**Note**: The cluster must be running before executing this script. If using the skill hooks, the cluster is started automatically. If running manually, use `cluster-up.sh` first. +**Note**: The namespace must exist before running this script. If using the skill hooks, the namespace is created automatically. If running manually, use `cluster-up.sh` first. ## Basic Usage @@ -57,18 +63,19 @@ Use the automated script located at `.claude/skills/gateway-conformance-runner/r |--------|-------------| | `--skip-build` | Skip Rust compilation and Docker image building (use when images already exist) | | `--skip-deploy` | Skip gateway controller deployment (use when controller is already running) | -| `--cluster-name NAME` | Specify cluster name (default: derived from git branch, e.g., `mw-main`) | +| `--cluster-name NAME` | Specify cluster name (default: `mw-conformance`) | +| `--namespace NAME` | Kubernetes namespace for deployment (default: `multiway-system`) | | `--dry-run` | Print commands without executing them | | `--help` | Show help message | ## What the Script Does -The script automates the conformance testing workflow (cluster is managed separately): +The script automates the conformance testing workflow: 1. **Prerequisites Check**: Verifies Docker and kubectl are available, cluster is accessible 2. **Environment Verification**: Checks that `GATEWAY_CONFORMANCE_SUITE` and `DOCKER_REGISTRY` environment variables are set 3. **Build & Push**: Compiles the Rust project, builds Docker images, and pushes them to the container registry -4. **Deploy**: Cleans up any existing deployments, installs Gateway API CRDs, creates a fresh namespace, deploys the gateway controller, and waits for pods to be ready +4. **Deploy**: Cleans up any existing deployments in the namespace, installs Gateway API CRDs, deploys the gateway controller, and waits for pods to be ready 5. **Test Execution**: Runs the conformance tests from the local Gateway API repository ## Prerequisites @@ -79,30 +86,30 @@ Before running the skill, ensure: 2. **kubectl** is installed 3. **doctl** (DigitalOcean CLI) is installed and authenticated 4. **Gateway API repository** is cloned locally -5. **`GATEWAY_CONFORMANCE_SUITE`** environment variable is set in `.envrc.local`: +5. **`mw-conformance` cluster** on DigitalOcean — created automatically by `cluster-up.sh` if it doesn't exist +6. **`GATEWAY_CONFORMANCE_SUITE`** environment variable is set in `.envrc.local`: ```bash export GATEWAY_CONFORMANCE_SUITE=/path/to/gateway-api ``` -6. **`DOCKER_REGISTRY`** environment variable is set in `.envrc.local`: +7. **`DOCKER_REGISTRY`** environment variable is set in `.envrc.local`: ```bash export DOCKER_REGISTRY=ghcr.io/myorg ``` -7. **`DO_REGION`** (optional) environment variable for the cluster region (default: `nyc3`): - ```bash - export DO_REGION=nyc3 - ``` ## Example Commands ```bash -# Full test run (build, deploy, test) +# Full test run (build, deploy, test) in default namespace .claude/skills/gateway-conformance-runner/run-conformance.sh +# Run in a specific namespace (e.g., for a Linear ticket) +.claude/skills/gateway-conformance-runner/run-conformance.sh --namespace multi-1101 + # Quick re-test (skip build, images already exist) -.claude/skills/gateway-conformance-runner/run-conformance.sh --skip-build +.claude/skills/gateway-conformance-runner/run-conformance.sh --skip-build --namespace multi-1101 # Just run tests (controller already deployed) -.claude/skills/gateway-conformance-runner/run-conformance.sh --skip-build --skip-deploy +.claude/skills/gateway-conformance-runner/run-conformance.sh --skip-build --skip-deploy --namespace multi-1101 # Preview what would be executed .claude/skills/gateway-conformance-runner/run-conformance.sh --dry-run @@ -121,12 +128,9 @@ Your job is to run the conformance tests and report the results. Do not debug er # Error Handling & Troubleshooting -If the script fails to execute, there are a handful of tools that -you may wish to invoke to get conformance testing back on track. - -If the script fails, consider these responsibilities: +If the script fails, consider these areas: -1. **Cluster Management**: The `cluster-up.sh` and `cluster-down.sh` scripts handle cluster lifecycle +1. **Cluster Access**: The `mw-conformance` cluster is created automatically by `cluster-up.sh` if it doesn't exist. If auto-creation fails, check DigitalOcean authentication (`doctl auth init`). 2. **Build Pipeline**: Build Docker images for the gateway controller and ensure they're properly pushed to the container registry 3. **Deployment**: Deploy the gateway controller and all necessary CRDs following the project's established patterns 4. **Test Execution**: Run the official Gateway API conformance test suite with appropriate configuration @@ -135,8 +139,8 @@ If the script fails, consider these responsibilities: When encountering issues: - If `GATEWAY_CONFORMANCE_SUITE` is not set, guide the user to configure it in `.envrc.local` - If `DOCKER_REGISTRY` is not set, guide the user to configure it in `.envrc.local` -- If kubectl context is wrong, run `cluster-up.sh` to reset the context -- If DigitalOcean cluster creation fails, check doctl authentication and DigitalOcean account quotas +- If kubectl context is wrong, run `doctl kubernetes cluster kubeconfig save mw-conformance` +- If the shared cluster doesn't exist, `cluster-up.sh` creates it automatically. If auto-creation fails, check DigitalOcean auth. - If image push fails, verify registry authentication (e.g., `docker login ghcr.io`) - If tests fail, analyze logs for specific failure points but do not suggest fixes - If deployment fails, check resource definitions and cluster state @@ -144,12 +148,11 @@ When encountering issues: **Best Practices**: 1. Always verify kubectl context before running tests to avoid running against production clusters -2. Ensure the DigitalOcean cluster is clean before running tests to avoid state pollution +2. Use namespace isolation to avoid state pollution between test runs 3. Verify all prerequisites (Docker, doctl, kubectl, Go) are installed and functioning 4. Verify that the Rust project will compile before building Docker images: `cargo check` -5. Use `cargo make conformance-cleanup` between test runs to ensure clean state (in-cluster only) -6. Check that the gateway controller is fully deployed before running tests -7. The shutdown hook will delete the cluster automatically; if running manually, use `cluster-down.sh` +5. Check that the gateway controller is fully deployed before running tests +6. The shutdown hook will delete the namespace automatically; if running manually, use `cluster-down.sh` **Available Docker Build Commands**: The project provides several cargo make tasks for building Docker images: @@ -159,9 +162,9 @@ The project provides several cargo make tasks for building Docker images: - `cargo make docker-build-all-cross` - Build for both amd64 and arm64 (multi-platform) - `cargo make docker-build-all-push` - Build and push multi-platform images (requires DOCKER_REGISTRY env var) -**Cluster Lifecycle Commands**: -- `cluster-up.sh` - Create or prepare the DigitalOcean cluster -- `cluster-down.sh` - Destroy the DigitalOcean cluster -- `cargo make do-create` - Create a new cluster (called by cluster-up.sh) -- `cargo make do-delete` - Delete the cluster (called by cluster-down.sh) -- `cargo make do-use` - Save kubeconfig for the cluster +**Namespace Lifecycle Commands**: +- `cluster-up.sh` - Verify cluster access (auto-create if needed) and create namespace +- `cluster-down.sh` - Delete namespace (cluster is preserved) +- `cluster-down.sh --destroy-cluster` - Delete namespace AND destroy the DigitalOcean cluster + clean up kubeconfig +- `cluster-up.sh --namespace ` - Create a specific namespace +- `cluster-down.sh --namespace ` - Delete a specific namespace diff --git a/.claude/skills/gateway-conformance-runner/cluster-down.sh b/.claude/skills/gateway-conformance-runner/cluster-down.sh index 083eb5b..b5aae03 100755 --- a/.claude/skills/gateway-conformance-runner/cluster-down.sh +++ b/.claude/skills/gateway-conformance-runner/cluster-down.sh @@ -3,17 +3,20 @@ # # cluster-down.sh # -# Destroys the DigitalOcean Kubernetes cluster and cleans up the kubectl context. -# This script is designed to be run when the conformance testing session is complete. +# Tears down the conformance test namespace, cleaning up all resources deployed +# during the test run. The shared cluster is NOT destroyed — only the namespace +# is deleted. # -# The cluster name defaults to a sanitized version of the current git branch, -# prefixed with "mw-" (e.g., branch "feature/my-test" becomes "mw-feature-my-test"). +# The namespace defaults to a sanitized version of the current git branch name, +# lowercased (e.g., branch "robbie/multi-1101" becomes "multi-1101"). # # Usage: # ./cluster-down.sh [OPTIONS] # # Options: -# --cluster-name NAME Name of the DigitalOcean cluster (default: derived from git branch) +# --cluster-name NAME Name of the DigitalOcean cluster (default: mw-conformance) +# --namespace NAME Namespace to delete (default: derived from git branch) +# --destroy-cluster Also destroy the DigitalOcean cluster and remove kubeconfig # --dry-run Print commands without executing them # --help Show this help message # @@ -29,12 +32,21 @@ set -euo pipefail SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" source "${SCRIPT_DIR}/lib.sh" +# ============================================================================= +# CONFIGURATION +# ============================================================================= + +# The shared conformance cluster. This cluster is long-lived and never destroyed. +readonly DEFAULT_CLUSTER_NAME="mw-conformance" + # ============================================================================= # GLOBAL STATE # ============================================================================= # These variables are set by parse_arguments() and used throughout the script. -CLUSTER_NAME="" +CLUSTER_NAME="${DEFAULT_CLUSTER_NAME}" +NAMESPACE="" +DESTROY_CLUSTER=false DRY_RUN=false # ============================================================================= @@ -45,31 +57,36 @@ DRY_RUN=false # Prints the help message and exits. ####################################### show_help() { - local default_name - default_name=$(get_default_cluster_name) - cat << EOF Usage: $(basename "$0") [OPTIONS] -Destroys the DigitalOcean Kubernetes cluster and cleans up the kubectl context. +Tears down the conformance test namespace. By default the cluster is preserved. +Use --destroy-cluster to also destroy the DigitalOcean cluster and clean up +the local kubeconfig. -The cluster name defaults to a sanitized version of the current git branch, -prefixed with "${CLUSTER_NAME_PREFIX}-" (e.g., "feature/my-test" becomes "${CLUSTER_NAME_PREFIX}-feature-my-test"). +The namespace defaults to a sanitized version of the current git branch name, +lowercased (e.g., branch "robbie/multi-1101" becomes "multi-1101"). Options: - --cluster-name NAME Name of the DigitalOcean cluster (default: ${default_name}) + --cluster-name NAME Name of the DigitalOcean cluster (default: ${DEFAULT_CLUSTER_NAME}) + --namespace NAME Namespace to delete (default: derived from git branch) + --destroy-cluster Also destroy the DigitalOcean cluster and remove kubeconfig --dry-run Print commands without executing them --help Show this help message Environment Variables: - DO_CLUSTER_NAME Override the cluster name + DO_CLUSTER_NAME Override the cluster name + CONFORMANCE_NAMESPACE Override the namespace Examples: - # Destroy cluster for current branch + # Delete namespace for current branch ./cluster-down.sh - # Destroy a specific cluster - ./cluster-down.sh --cluster-name my-test-cluster + # Delete a specific namespace + ./cluster-down.sh --namespace multi-1101 + + # Delete namespace AND destroy the cluster to save costs + ./cluster-down.sh --destroy-cluster # See what commands would be run ./cluster-down.sh --dry-run @@ -77,6 +94,23 @@ EOF exit 0 } +####################################### +# Derives a namespace name from the current git branch. +# Extracts the last path component (e.g., "robbie/multi-1101" -> "multi-1101") +# and lowercases it. +####################################### +get_default_namespace() { + local branch_name + if branch_name=$(git rev-parse --abbrev-ref HEAD 2>/dev/null); then + # Take the last path component (after the last /) + local base_name="${branch_name##*/}" + # Lowercase and sanitize for Kubernetes namespace rules + echo "${base_name}" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9-]/-/g; s/^-//; s/-$//' + else + echo "multiway-system" + fi +} + # ============================================================================= # ARGUMENT PARSING # ============================================================================= @@ -87,11 +121,16 @@ EOF # $@ - All command-line arguments passed to the script ####################################### parse_arguments() { - # Default to environment variable, then git branch-based name + # Default cluster name from environment variable if [[ -n "${DO_CLUSTER_NAME:-}" ]]; then CLUSTER_NAME="${DO_CLUSTER_NAME}" + fi + + # Default namespace from environment variable, then git branch + if [[ -n "${CONFORMANCE_NAMESPACE:-}" ]]; then + NAMESPACE="${CONFORMANCE_NAMESPACE}" else - CLUSTER_NAME=$(get_default_cluster_name) + NAMESPACE=$(get_default_namespace) fi while [[ $# -gt 0 ]]; do @@ -103,6 +142,17 @@ parse_arguments() { CLUSTER_NAME="$2" shift 2 ;; + --namespace) + if [[ -z "${2:-}" ]]; then + error_exit "--namespace requires a value" + fi + NAMESPACE="$2" + shift 2 + ;; + --destroy-cluster) + DESTROY_CLUSTER=true + shift + ;; --dry-run) DRY_RUN=true shift @@ -120,7 +170,7 @@ parse_arguments() { # ============================================================================= # PREREQUISITE CHECKS # -# Note: check_doctl_available and check_kubectl_available are provided by lib.sh +# Note: check_kubectl_available is provided by lib.sh # ============================================================================= ####################################### @@ -128,21 +178,65 @@ parse_arguments() { ####################################### check_prerequisites() { info "=== Checking Prerequisites ===" - check_doctl_available check_kubectl_available + if [[ "${DESTROY_CLUSTER}" == true ]]; then + check_doctl_available + fi success "All prerequisites satisfied" echo "" } # ============================================================================= -# CLUSTER MANAGEMENT +# NAMESPACE TEARDOWN +# ============================================================================= + +####################################### +# Deletes the conformance test namespace and all its resources. +# This is the primary cleanup mechanism — deleting the namespace removes +# all deployments, services, configmaps, and other resources within it. +####################################### +delete_namespace() { + info "=== Namespace Teardown ===" + + if [[ "${DRY_RUN}" == true ]]; then + echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} kubectl delete namespace ${NAMESPACE} --wait=true --ignore-not-found" + success "Namespace deletion skipped (dry-run mode)" + return 0 + fi + + # Check if namespace exists + if ! kubectl get namespace "${NAMESPACE}" &>/dev/null; then + success "Namespace '${NAMESPACE}' does not exist, nothing to clean up" + return 0 + fi + + info "Deleting namespace '${NAMESPACE}' and all its resources..." + if ! kubectl delete namespace "${NAMESPACE}" --wait=true --ignore-not-found; then + error_exit "Failed to delete namespace '${NAMESPACE}'" + fi + + # Wait for deletion to fully complete + info "Waiting for namespace deletion to complete..." + if ! kubectl wait --for=delete namespace/"${NAMESPACE}" --timeout=120s 2>/dev/null; then + # The wait command may fail if the namespace is already gone + if kubectl get namespace "${NAMESPACE}" &>/dev/null; then + error_exit "Namespace '${NAMESPACE}' was not deleted within timeout" + fi + fi + + success "Namespace '${NAMESPACE}' deleted" + echo "" +} + +# ============================================================================= +# CLUSTER DESTRUCTION (optional, triggered by --destroy-cluster) # # Note: do_cluster_exists is provided by lib.sh # ============================================================================= ####################################### # Gets the kubectl context name for a DigitalOcean cluster. -# DigitalOcean contexts are typically named do-- +# DigitalOcean contexts are typically named do--. # Returns the context name via stdout, or empty string if not found. ####################################### get_kubectl_context_name() { @@ -151,10 +245,12 @@ get_kubectl_context_name() { } ####################################### -# Deletes the kubectl context for the cluster. +# Deletes the kubectl context, cluster entry, and user entry for the cluster. +# This cleans up the local kubeconfig so the destroyed cluster no longer +# appears in `kubectl config get-contexts`. ####################################### delete_kubectl_context() { - info "Cleaning up kubectl context..." + info "Cleaning up kubectl context for cluster '${CLUSTER_NAME}'..." if [[ "${DRY_RUN}" == true ]]; then echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} kubectl config delete-context " @@ -174,33 +270,26 @@ delete_kubectl_context() { info "Found kubectl context: ${context_name}" - # Get the current context to check if we need to switch + # If this is the current context, unset it first local current_context current_context=$(kubectl config current-context 2>/dev/null || echo "") - - # If the context we're deleting is the current one, unset it first if [[ "${current_context}" == "${context_name}" ]]; then warn "Unsetting current kubectl context..." kubectl config unset current-context 2>/dev/null || true fi - # Delete the context - info "Deleting context '${context_name}'..." + # Delete context, cluster entry, and user entry kubectl config delete-context "${context_name}" 2>/dev/null || true - # Try to delete associated cluster and user entries - # These are typically named with the same pattern local cluster_entry cluster_entry=$(kubectl config get-clusters 2>/dev/null | grep "${CLUSTER_NAME}" | head -1 || echo "") if [[ -n "${cluster_entry}" ]]; then - info "Deleting cluster entry '${cluster_entry}'..." kubectl config delete-cluster "${cluster_entry}" 2>/dev/null || true fi local user_entry user_entry=$(kubectl config get-users 2>/dev/null | grep "${CLUSTER_NAME}" | head -1 || echo "") if [[ -n "${user_entry}" ]]; then - info "Deleting user entry '${user_entry}'..." kubectl config delete-user "${user_entry}" 2>/dev/null || true fi @@ -208,24 +297,18 @@ delete_kubectl_context() { } ####################################### -# Destroys the DigitalOcean Kubernetes cluster. +# Destroys the DigitalOcean Kubernetes cluster and cleans up kubeconfig. ####################################### destroy_cluster() { info "=== Cluster Destruction ===" if [[ "${DRY_RUN}" == true ]]; then - if do_cluster_exists "${CLUSTER_NAME}"; then - info "Cluster '${CLUSTER_NAME}' exists" - else - info "Cluster '${CLUSTER_NAME}' does not exist" - fi echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} doctl kubernetes cluster delete ${CLUSTER_NAME} --force --dangerous" success "Cluster destruction skipped (dry-run mode)" delete_kubectl_context return 0 fi - # Check if cluster exists if ! do_cluster_exists "${CLUSTER_NAME}"; then warn "Cluster '${CLUSTER_NAME}' does not exist, nothing to destroy" delete_kubectl_context @@ -241,9 +324,7 @@ destroy_cluster() { success "Cluster '${CLUSTER_NAME}' destroyed" - # Clean up kubectl context delete_kubectl_context - echo "" } @@ -256,24 +337,34 @@ main() { echo "" info "===========================================" - info "DigitalOcean Kubernetes Cluster Shutdown" + info "Conformance Teardown" info "===========================================" echo "" info "Configuration:" - info " Cluster name: ${CLUSTER_NAME}" - info " Dry run: ${DRY_RUN}" + info " Cluster name: ${CLUSTER_NAME}" + info " Namespace: ${NAMESPACE} (will be deleted)" + info " Destroy cluster: ${DESTROY_CLUSTER}" + info " Dry run: ${DRY_RUN}" echo "" check_prerequisites - destroy_cluster + delete_namespace + + if [[ "${DESTROY_CLUSTER}" == true ]]; then + destroy_cluster + fi echo "" success "===========================================" - success "Cluster shutdown complete" + success "Teardown complete" success "===========================================" echo "" - info "The DigitalOcean cluster has been destroyed." - info "You will no longer be charged for cluster resources." + info "Namespace '${NAMESPACE}' has been deleted." + if [[ "${DESTROY_CLUSTER}" == true ]]; then + info "Cluster '${CLUSTER_NAME}' has been destroyed and removed from kubeconfig." + else + info "Cluster '${CLUSTER_NAME}' remains running." + fi } main "$@" diff --git a/.claude/skills/gateway-conformance-runner/cluster-up.sh b/.claude/skills/gateway-conformance-runner/cluster-up.sh index cd36747..249615e 100755 --- a/.claude/skills/gateway-conformance-runner/cluster-up.sh +++ b/.claude/skills/gateway-conformance-runner/cluster-up.sh @@ -3,19 +3,17 @@ # # cluster-up.sh # -# Ensures the DigitalOcean Kubernetes cluster is running and ready for use. -# If the cluster exists, it clears the gateway namespace for a fresh state. -# If the cluster doesn't exist, it creates one. -# Either way, it ensures the kubectl context is properly configured. -# -# The cluster name defaults to a sanitized version of the current git branch, -# prefixed with "mw-" (e.g., branch "feature/my-test" becomes "mw-feature-my-test"). +# Ensures the DigitalOcean Kubernetes cluster is accessible and creates +# an isolated namespace for conformance testing. If the cluster doesn't exist, +# it is created automatically. The namespace is derived from the current git +# branch name (lowercased, sanitized). # # Usage: # ./cluster-up.sh [OPTIONS] # # Options: -# --cluster-name NAME Name of the DigitalOcean cluster (default: derived from git branch) +# --cluster-name NAME Name of the DigitalOcean cluster (default: mw-conformance) +# --namespace NAME Namespace to create (default: derived from git branch) # --dry-run Print commands without executing them # --help Show this help message # @@ -35,15 +33,17 @@ source "${SCRIPT_DIR}/lib.sh" # CONFIGURATION # ============================================================================= -# Script-specific constants. -readonly DEFAULT_NAMESPACE="multiway-system" +# The default conformance cluster. This cluster is long-lived and shared across +# all conformance test runs. If it doesn't exist, it will be created automatically. +readonly DEFAULT_CLUSTER_NAME="mw-conformance" # ============================================================================= # GLOBAL STATE # ============================================================================= # These variables are set by parse_arguments() and used throughout the script. -CLUSTER_NAME="" +CLUSTER_NAME="${DEFAULT_CLUSTER_NAME}" +NAMESPACE="" DRY_RUN=false # ============================================================================= @@ -54,34 +54,32 @@ DRY_RUN=false # Prints the help message and exits. ####################################### show_help() { - local default_name - default_name=$(get_default_cluster_name) - cat << EOF Usage: $(basename "$0") [OPTIONS] -Ensures the DigitalOcean Kubernetes cluster is running and ready for use. -If the cluster exists, clears the gateway namespace for a fresh state. -If the cluster doesn't exist, creates a new one. +Ensures the DigitalOcean Kubernetes cluster is accessible and creates +an isolated namespace for conformance testing. If the cluster doesn't +exist, it will be created automatically. -The cluster name defaults to a sanitized version of the current git branch, -prefixed with "${CLUSTER_NAME_PREFIX}-" (e.g., "feature/my-test" becomes "${CLUSTER_NAME_PREFIX}-feature-my-test"). +The namespace defaults to a sanitized version of the current git branch name, +lowercased (e.g., branch "robbie/multi-1101" becomes "multi-1101"). Options: - --cluster-name NAME Name of the DigitalOcean cluster (default: ${default_name}) + --cluster-name NAME Name of the DigitalOcean cluster (default: ${DEFAULT_CLUSTER_NAME}) + --namespace NAME Namespace to create (default: derived from git branch) --dry-run Print commands without executing them --help Show this help message Environment Variables: - DO_CLUSTER_NAME Override the cluster name - DO_REGION DigitalOcean region for cluster (default: nyc3) + DO_CLUSTER_NAME Override the cluster name + CONFORMANCE_NAMESPACE Override the namespace Examples: - # Start or prepare cluster for current branch + # Prepare namespace for current branch ./cluster-up.sh - # Use a specific cluster name - ./cluster-up.sh --cluster-name my-test-cluster + # Use a specific namespace (e.g., for a Linear ticket) + ./cluster-up.sh --namespace multi-1101 # See what commands would be run ./cluster-up.sh --dry-run @@ -89,6 +87,23 @@ EOF exit 0 } +####################################### +# Derives a namespace name from the current git branch. +# Extracts the last path component (e.g., "robbie/multi-1101" -> "multi-1101") +# and lowercases it. +####################################### +get_default_namespace() { + local branch_name + if branch_name=$(git rev-parse --abbrev-ref HEAD 2>/dev/null); then + # Take the last path component (after the last /) + local base_name="${branch_name##*/}" + # Lowercase and sanitize for Kubernetes namespace rules + echo "${base_name}" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9-]/-/g; s/^-//; s/-$//' + else + echo "multiway-system" + fi +} + # ============================================================================= # ARGUMENT PARSING # ============================================================================= @@ -99,11 +114,16 @@ EOF # $@ - All command-line arguments passed to the script ####################################### parse_arguments() { - # Default to environment variable, then git branch-based name + # Default cluster name from environment variable if [[ -n "${DO_CLUSTER_NAME:-}" ]]; then CLUSTER_NAME="${DO_CLUSTER_NAME}" + fi + + # Default namespace from environment variable, then git branch + if [[ -n "${CONFORMANCE_NAMESPACE:-}" ]]; then + NAMESPACE="${CONFORMANCE_NAMESPACE}" else - CLUSTER_NAME=$(get_default_cluster_name) + NAMESPACE=$(get_default_namespace) fi while [[ $# -gt 0 ]]; do @@ -115,6 +135,13 @@ parse_arguments() { CLUSTER_NAME="$2" shift 2 ;; + --namespace) + if [[ -z "${2:-}" ]]; then + error_exit "--namespace requires a value" + fi + NAMESPACE="$2" + shift 2 + ;; --dry-run) DRY_RUN=true shift @@ -147,28 +174,11 @@ check_prerequisites() { } # ============================================================================= -# CLUSTER MANAGEMENT +# CLUSTER AND NAMESPACE MANAGEMENT # # Note: do_cluster_exists is provided by lib.sh # ============================================================================= -####################################### -# Creates a new DigitalOcean Kubernetes cluster. -# Uses cargo make do-create which handles all configuration. -####################################### -create_cluster() { - info "Creating DigitalOcean Kubernetes cluster '${CLUSTER_NAME}'..." - - # Export the cluster name so cargo make can use it - export DO_CLUSTER_NAME="${CLUSTER_NAME}" - - if ! run_cmd cargo make do-create; then - error_exit "Failed to create DigitalOcean cluster '${CLUSTER_NAME}'" - fi - - success "Cluster '${CLUSTER_NAME}' created successfully" -} - ####################################### # Saves the kubeconfig for the cluster. ####################################### @@ -210,63 +220,77 @@ verify_cluster_accessible() { } ####################################### -# Clears the gateway namespace to ensure a fresh state. -# Deletes the namespace if it exists, which removes all resources within it. +# Creates a fresh namespace for this conformance run. +# Deletes the namespace first if it already exists, to ensure a clean slate. ####################################### -clear_namespace() { - info "Clearing namespace '${DEFAULT_NAMESPACE}' for fresh state..." +create_namespace() { + info "Setting up namespace '${NAMESPACE}'..." if [[ "${DRY_RUN}" == true ]]; then - echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} kubectl delete namespace ${DEFAULT_NAMESPACE} --ignore-not-found" - echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} kubectl wait --for=delete namespace/${DEFAULT_NAMESPACE} --timeout=60s" - success "Namespace cleanup skipped (dry-run mode)" + echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} kubectl delete namespace ${NAMESPACE} --ignore-not-found --wait=true" + echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} kubectl create namespace ${NAMESPACE}" + success "Namespace setup skipped (dry-run mode)" return 0 fi - # Check if namespace exists - if ! kubectl get namespace "${DEFAULT_NAMESPACE}" &>/dev/null; then - success "Namespace '${DEFAULT_NAMESPACE}' does not exist, nothing to clear" - return 0 + # Delete existing namespace if present (ensures clean state) + if kubectl get namespace "${NAMESPACE}" &>/dev/null; then + warn "Namespace '${NAMESPACE}' already exists, deleting for clean state..." + if ! kubectl delete namespace "${NAMESPACE}" --wait=true; then + error_exit "Failed to delete existing namespace '${NAMESPACE}'" + fi + # Wait for deletion to fully complete + if ! kubectl wait --for=delete namespace/"${NAMESPACE}" --timeout=60s 2>/dev/null; then + if kubectl get namespace "${NAMESPACE}" &>/dev/null; then + error_exit "Namespace '${NAMESPACE}' was not deleted within timeout" + fi + fi fi - # Delete the namespace - info "Deleting namespace '${DEFAULT_NAMESPACE}' and all its resources..." - if ! kubectl delete namespace "${DEFAULT_NAMESPACE}" --ignore-not-found; then - error_exit "Failed to delete namespace '${DEFAULT_NAMESPACE}'" + # Create fresh namespace + if ! kubectl create namespace "${NAMESPACE}"; then + error_exit "Failed to create namespace '${NAMESPACE}'" fi - # Wait for deletion to complete - info "Waiting for namespace deletion to complete..." - if ! kubectl wait --for=delete namespace/"${DEFAULT_NAMESPACE}" --timeout=60s 2>/dev/null; then - # The wait command may fail if the namespace is already gone - if kubectl get namespace "${DEFAULT_NAMESPACE}" &>/dev/null; then - error_exit "Namespace '${DEFAULT_NAMESPACE}' was not deleted within timeout" - fi + success "Namespace '${NAMESPACE}' created" +} + +####################################### +# Creates a new DigitalOcean Kubernetes cluster. +# Uses cargo make do-create which handles all configuration. +####################################### +create_cluster() { + info "Creating DigitalOcean Kubernetes cluster '${CLUSTER_NAME}'..." + + # Export the cluster name so cargo make can use it + export DO_CLUSTER_NAME="${CLUSTER_NAME}" + + if ! run_cmd cargo make do-create; then + error_exit "Failed to create DigitalOcean cluster '${CLUSTER_NAME}'" fi - success "Namespace '${DEFAULT_NAMESPACE}' cleared" + success "Cluster '${CLUSTER_NAME}' created successfully" } ####################################### -# Ensures the cluster is up and ready. -# Creates if missing, clears namespace if existing. +# Ensures the cluster is accessible and namespace is ready. +# If the cluster doesn't exist, creates it automatically. ####################################### -ensure_cluster_ready() { - info "=== Cluster Setup ===" +ensure_ready() { + info "=== Cluster and Namespace Setup ===" if do_cluster_exists "${CLUSTER_NAME}"; then - success "DigitalOcean cluster '${CLUSTER_NAME}' already exists" - save_kubeconfig - verify_cluster_accessible - clear_namespace + success "Cluster '${CLUSTER_NAME}' already exists" else - warn "DigitalOcean cluster '${CLUSTER_NAME}' does not exist" + warn "Cluster '${CLUSTER_NAME}' does not exist, creating it..." create_cluster - save_kubeconfig - verify_cluster_accessible fi - success "Cluster is ready for use" + save_kubeconfig + verify_cluster_accessible + create_namespace + + success "Cluster and namespace ready" echo "" } @@ -279,24 +303,25 @@ main() { echo "" info "===========================================" - info "DigitalOcean Kubernetes Cluster Startup" + info "Conformance Namespace Setup" info "===========================================" echo "" info "Configuration:" info " Cluster name: ${CLUSTER_NAME}" + info " Namespace: ${NAMESPACE}" info " Dry run: ${DRY_RUN}" echo "" check_prerequisites - ensure_cluster_ready + ensure_ready echo "" success "===========================================" - success "Cluster startup complete" + success "Namespace setup complete" success "===========================================" echo "" - info "kubectl context is now set to the cluster." - info "You can now run conformance tests or deploy applications." + info "kubectl context is set to cluster '${CLUSTER_NAME}'." + info "Namespace '${NAMESPACE}' is ready for conformance testing." } main "$@" diff --git a/.claude/skills/gateway-conformance-runner/run-conformance.sh b/.claude/skills/gateway-conformance-runner/run-conformance.sh index 8a1b298..5114822 100755 --- a/.claude/skills/gateway-conformance-runner/run-conformance.sh +++ b/.claude/skills/gateway-conformance-runner/run-conformance.sh @@ -20,7 +20,8 @@ # --release Use production Dockerfile (higher optimization, slower builds) # --skip-build Skip the Rust compilation and Docker image build steps # --skip-deploy Skip the gateway controller deployment step -# --cluster-name NAME Name of the cluster (default: derived from git branch) +# --cluster-name NAME Name of the cluster (default: mw-conformance) +# --namespace NAME Kubernetes namespace for deployment (default: multiway-system) # --dry-run Print commands without executing them # --help Show this help message # @@ -41,7 +42,6 @@ source "${SCRIPT_DIR}/lib.sh" # ============================================================================= # Script-specific constants for gateway deployment and pod management. -readonly DEFAULT_NAMESPACE="multiway-system" readonly DEFAULT_POD_LABEL="app.kubernetes.io/name=multiway" readonly DEFAULT_POD_READY_TIMEOUT="120s" readonly DEFAULT_EXTENDED_POD_READY_TIMEOUT="300s" @@ -54,6 +54,7 @@ readonly DEFAULT_EXTENDED_POD_READY_TIMEOUT="300s" # They control which phases of the workflow are executed and how the script # identifies the target cluster. CLUSTER_NAME="" +NAMESPACE="multiway-system" DEV_BUILD=true SKIP_BUILD=false SKIP_DEPLOY=false @@ -85,6 +86,7 @@ Options: --skip-build Skip the Rust compilation and Docker image build steps --skip-deploy Skip the gateway controller deployment step --cluster-name NAME Name of the cluster (default: ${default_name}) + --namespace NAME Kubernetes namespace for deployment (default: multiway-system) --dry-run Print commands without executing them --help Show this help message @@ -92,6 +94,7 @@ Environment Variables: GATEWAY_CONFORMANCE_SUITE Path to the Gateway API repository root (required) DOCKER_REGISTRY Container registry URL (required for build, e.g., ghcr.io/myorg) DO_CLUSTER_NAME Override the cluster name + CONFORMANCE_NAMESPACE Override the namespace Examples: # Run full conformance test workflow @@ -134,6 +137,11 @@ parse_arguments() { CLUSTER_NAME=$(get_default_cluster_name) fi + # Default namespace to environment variable if set + if [[ -n "${CONFORMANCE_NAMESPACE:-}" ]]; then + NAMESPACE="${CONFORMANCE_NAMESPACE}" + fi + while [[ $# -gt 0 ]]; do case "$1" in --release) @@ -155,6 +163,13 @@ parse_arguments() { CLUSTER_NAME="$2" shift 2 ;; + --namespace) + if [[ -z "${2:-}" ]]; then + error_exit "--namespace requires a value" + fi + NAMESPACE="$2" + shift 2 + ;; --dry-run) DRY_RUN=true shift @@ -395,29 +410,29 @@ namespace_exists() { # resources within it (deployments, services, configmaps, etc.). ####################################### cleanup_existing_deployment() { - info "Cleaning up existing deployments in namespace '${DEFAULT_NAMESPACE}'..." + info "Cleaning up existing deployments in namespace '${NAMESPACE}'..." if [[ "${DRY_RUN}" == true ]]; then - echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} kubectl delete namespace ${DEFAULT_NAMESPACE} --ignore-not-found" - echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} kubectl wait --for=delete namespace/${DEFAULT_NAMESPACE} --timeout=60s" + echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} kubectl delete namespace ${NAMESPACE} --ignore-not-found" + echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} kubectl wait --for=delete namespace/${NAMESPACE} --timeout=60s" success "Cleanup skipped (dry-run mode)" return 0 fi - if ! namespace_exists "${DEFAULT_NAMESPACE}"; then - success "Namespace '${DEFAULT_NAMESPACE}' does not exist, nothing to clean up" + if ! namespace_exists "${NAMESPACE}"; then + success "Namespace '${NAMESPACE}' does not exist, nothing to clean up" return 0 fi - info "Deleting namespace '${DEFAULT_NAMESPACE}' and all its resources..." - if ! kubectl delete namespace "${DEFAULT_NAMESPACE}" --ignore-not-found; then - error_exit "Failed to delete namespace '${DEFAULT_NAMESPACE}'" + info "Deleting namespace '${NAMESPACE}' and all its resources..." + if ! kubectl delete namespace "${NAMESPACE}" --ignore-not-found; then + error_exit "Failed to delete namespace '${NAMESPACE}'" fi info "Waiting for namespace deletion to complete..." - if ! kubectl wait --for=delete namespace/"${DEFAULT_NAMESPACE}" --timeout=60s 2>/dev/null; then - if namespace_exists "${DEFAULT_NAMESPACE}"; then - error_exit "Namespace '${DEFAULT_NAMESPACE}' was not deleted within timeout" + if ! kubectl wait --for=delete namespace/"${NAMESPACE}" --timeout=60s 2>/dev/null; then + if namespace_exists "${NAMESPACE}"; then + error_exit "Namespace '${NAMESPACE}' was not deleted within timeout" fi fi @@ -430,19 +445,19 @@ cleanup_existing_deployment() { # cluster and makes cleanup straightforward (delete the namespace). ####################################### create_fresh_namespace() { - info "Creating fresh namespace '${DEFAULT_NAMESPACE}'..." + info "Creating fresh namespace '${NAMESPACE}'..." if [[ "${DRY_RUN}" == true ]]; then - echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} kubectl create namespace ${DEFAULT_NAMESPACE}" + echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} kubectl create namespace ${NAMESPACE}" success "Namespace creation skipped (dry-run mode)" return 0 fi - if ! kubectl create namespace "${DEFAULT_NAMESPACE}"; then - error_exit "Failed to create namespace '${DEFAULT_NAMESPACE}'" + if ! kubectl create namespace "${NAMESPACE}"; then + error_exit "Failed to create namespace '${NAMESPACE}'" fi - success "Namespace '${DEFAULT_NAMESPACE}' created" + success "Namespace '${NAMESPACE}' created" } ####################################### @@ -486,29 +501,29 @@ wait_for_controller_ready() { info "Waiting for controller pods to be ready..." if [[ "${DRY_RUN}" == true ]]; then - echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} kubectl wait --for=condition=Ready pods -l ${DEFAULT_POD_LABEL} -n ${DEFAULT_NAMESPACE} --timeout=${DEFAULT_POD_READY_TIMEOUT}" + echo -e "${COLOR_YELLOW}[DRY-RUN]${COLOR_RESET} kubectl wait --for=condition=Ready pods -l ${DEFAULT_POD_LABEL} -n ${NAMESPACE} --timeout=${DEFAULT_POD_READY_TIMEOUT}" success "Pod readiness check skipped (dry-run mode)" return 0 fi - if kubectl wait --for=condition=Ready pods -l "${DEFAULT_POD_LABEL}" -n "${DEFAULT_NAMESPACE}" --timeout="${DEFAULT_POD_READY_TIMEOUT}" 2>/dev/null; then + if kubectl wait --for=condition=Ready pods -l "${DEFAULT_POD_LABEL}" -n "${NAMESPACE}" --timeout="${DEFAULT_POD_READY_TIMEOUT}" 2>/dev/null; then success "Controller pods are ready" return 0 fi warn "Pods not ready within ${DEFAULT_POD_READY_TIMEOUT}, extending wait to ${DEFAULT_EXTENDED_POD_READY_TIMEOUT}..." - if kubectl wait --for=condition=Ready pods -l "${DEFAULT_POD_LABEL}" -n "${DEFAULT_NAMESPACE}" --timeout="${DEFAULT_EXTENDED_POD_READY_TIMEOUT}" 2>/dev/null; then + if kubectl wait --for=condition=Ready pods -l "${DEFAULT_POD_LABEL}" -n "${NAMESPACE}" --timeout="${DEFAULT_EXTENDED_POD_READY_TIMEOUT}" 2>/dev/null; then success "Controller pods are ready (after extended wait)" return 0 fi warn "Pods still not ready. Current pod status:" - kubectl get pods -n "${DEFAULT_NAMESPACE}" -l "${DEFAULT_POD_LABEL}" + kubectl get pods -n "${NAMESPACE}" -l "${DEFAULT_POD_LABEL}" error_exit "Controller pods failed to become ready within ${DEFAULT_EXTENDED_POD_READY_TIMEOUT}. Please check the pod logs for errors: - kubectl logs -n ${DEFAULT_NAMESPACE} -l ${DEFAULT_POD_LABEL}" + kubectl logs -n ${NAMESPACE} -l ${DEFAULT_POD_LABEL}" } ####################################### @@ -595,6 +610,7 @@ main() { echo "" info "Configuration:" info " Cluster name: ${CLUSTER_NAME}" + info " Namespace: ${NAMESPACE}" info " Dev build: ${DEV_BUILD}" info " Skip build: ${SKIP_BUILD}" info " Skip deploy: ${SKIP_DEPLOY}"