Skip to content

New use case: CI-failure-driven agent remediation with check status filtering on githubPullRequestsΒ #809

@kelos-bot

Description

@kelos-bot

πŸ€– Kelos Strategist Agent @gjkim42

Area: New Use Cases + API Extension

Summary

One of the most requested AI agent use cases in 2025-2026 is automatic CI failure diagnosis and remediation β€” when CI fails on a PR, an agent investigates the failure, proposes a fix, and pushes it to the branch. Kelos is uniquely positioned to serve this use case but currently lacks the API surface to trigger agents based on CI check outcomes. This proposal describes the use case, identifies the specific gap in githubPullRequests, and proposes a checkConclusion filter and {{.FailedChecks}} template variable to enable it.

The Use Case: Auto-Fix Broken CI

Who benefits

  • Any team with CI pipelines β€” the most common developer friction point is a red CI check that blocks a PR
  • Open source maintainers β€” contributor PRs frequently fail CI due to lint, formatting, or test issues that are mechanical to fix
  • Large organizations β€” at scale, CI failures consume significant developer time on repetitive debugging

What the workflow looks like

  1. A developer opens a PR
  2. CI runs and fails (lint error, test failure, build break, etc.)
  3. Kelos discovers the PR has a failing check and spawns an agent
  4. The agent reads the CI failure logs, diagnoses the problem, and pushes a fix commit
  5. CI re-runs on the updated branch
  6. If CI passes, the PR is ready for review β€” developer time saved

Why Kelos is uniquely suited

Unlike standalone CI-fix bots, Kelos provides:

  • Full codebase context: Agents clone the repo and understand the full project, not just the diff
  • Configurable agent types and models: Use a fast/cheap model for lint fixes, a capable model for test failures
  • Concurrency controls: maxConcurrency prevents overwhelming CI with fix attempts
  • Branch locking: Only one agent works on a branch at a time (already implemented)
  • Task pipelines: Can chain diagnosis β†’ fix β†’ verification steps via dependsOn
  • Cost governance: maxTotalTasks, ttlSecondsAfterFinished, and proposed costBudget (API: Add costBudget to TaskSpawner for spending limits based on actual token usage and USD costΒ #788)
  • Priority scheduling: priorityLabels can prioritize fixing important PRs first

Current Gap

1. No CI check status filtering on githubPullRequests

The GitHubPullRequestsSpec currently filters by: labels, excludeLabels, state, reviewState, commentPolicy, author, draft, priorityLabels (api/v1alpha1/taskspawner_types.go:185-260).

There is no way to filter PRs by their CI check conclusion. A spawner watching for open PRs will discover ALL PRs regardless of their CI status, which means:

  • Agents would be spawned for PRs where CI hasn't even started yet
  • Agents would be spawned for PRs where CI is passing (wasting credits)
  • No way to distinguish between different failure types

2. No check failure details in prompt template variables

The WorkItem struct (internal/source/source.go:10-33) provides template variables for GitHub PRs: {{.Branch}}, {{.ReviewState}}, {{.ReviewComments}}. There are no variables for CI check status or failure details. An agent spawned to fix CI would need to discover the failure independently, rather than receiving it in the prompt.

3. The workaround is fragile

Today, the closest approximation is:

  1. Use a GitHub Action that comments /kelos fix-ci when checks fail
  2. Use triggerComment: "/kelos fix-ci" on the spawner

This is fragile (requires maintaining a separate GHA workflow), doesn't pass failure context to the agent, and loses the structured auditability that a first-class source filter would provide.

Proposed API Changes

1. Add checkConclusion filter to GitHubPullRequestsSpec

// In api/v1alpha1/taskspawner_types.go, add to GitHubPullRequestsSpec:

// CheckConclusion filters pull requests by the aggregate conclusion of
// their latest check suite. Only PRs where at least one check run
// matches the specified conclusion are discovered.
// Supported values: "failure", "success", "neutral", "cancelled",
// "timed_out", "action_required", "any".
// When unset or "any", check conclusion does not gate discovery.
// +kubebuilder:validation:Enum=failure;success;neutral;cancelled;timed_out;action_required;any
// +optional
CheckConclusion string `json:"checkConclusion,omitempty"`

// CheckNames optionally restricts which check runs are considered
// when evaluating checkConclusion. When empty, all check runs are
// considered. When set, only checks whose name matches one of these
// values are evaluated.
// Example: ["ci", "lint", "test"] would only trigger on failures
// from checks named "ci", "lint", or "test".
// +optional
CheckNames []string `json:"checkNames,omitempty"`

2. Add check failure details to WorkItem and prompt templates

// In internal/source/source.go, add to WorkItem:

// FailedChecks contains formatted details of failed check runs for
// GitHub PR sources when checkConclusion filtering is enabled.
// Includes check name, conclusion, and output summary.
FailedChecks string

// CheckConclusion is the aggregate check conclusion for GitHub PR sources.
CheckConclusion string

This exposes {{.FailedChecks}} and {{.CheckConclusion}} in prompt templates.

3. Implementation in GitHub PR source

The github_pr.go source already calls the GitHub API to discover PRs. To implement check filtering:

  1. After fetching PRs, call GET /repos/{owner}/{repo}/commits/{ref}/check-runs for each PR's head SHA
  2. Filter check runs by CheckNames if specified
  3. Evaluate aggregate conclusion against CheckConclusion filter
  4. Populate WorkItem.FailedChecks with failure details (check name, conclusion, output title/summary)
  5. Use conditional requests (ETag, already implemented via ETagTransport) to minimize API calls

API call budget: One additional API call per PR per poll cycle. With maxConcurrency limiting the number of active tasks and typical poll intervals of 2-5 minutes, this adds modest API overhead. The check-runs endpoint also supports conditional requests.

Example Configurations

Example 1: Auto-fix lint failures

apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
  name: ci-lint-fixer
spec:
  when:
    githubPullRequests:
      labels: ["ok-to-autofix"]
      checkConclusion: failure
      checkNames: ["lint", "fmt", "vet"]
      reporting:
        enabled: true
  maxConcurrency: 3
  taskTemplate:
    type: claude-code
    credentials:
      type: oauth
      secretRef:
        name: claude-credentials
    workspaceRef:
      name: my-workspace
    branch: "{{.Branch}}"
    ttlSecondsAfterFinished: 3600
    promptTemplate: |
      A CI check has failed on PR #{{.Number}}: {{.Title}}

      Failed checks:
      {{.FailedChecks}}

      Your job:
      1. Check out the PR branch (already done)
      2. Run the failing check locally to reproduce
      3. Fix the issue (lint, formatting, or vet errors)
      4. Commit and push the fix

      Do NOT change any logic or behavior. Only fix the specific
      lint/format/vet errors reported by CI.

Example 2: Diagnose and fix test failures with a pipeline

apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
  name: ci-test-fixer
spec:
  when:
    githubPullRequests:
      checkConclusion: failure
      checkNames: ["test", "unit-tests", "integration-tests"]
      commentPolicy:
        triggerComment: "/kelos fix-tests"
        minimumPermission: write
      reporting:
        enabled: true
  maxConcurrency: 1
  taskTemplate:
    type: claude-code
    credentials:
      type: oauth
      secretRef:
        name: claude-credentials
    workspaceRef:
      name: my-workspace
    branch: "{{.Branch}}"
    ttlSecondsAfterFinished: 7200
    promptTemplate: |
      CI test failure on PR #{{.Number}}: {{.Title}}

      Failed checks:
      {{.FailedChecks}}

      Steps:
      1. Read the failing test output above
      2. Reproduce the failure locally with `make test`
      3. Determine if this is a test bug or a code bug
      4. If it's a code bug, fix the code
      5. If it's a test bug (test needs updating for new behavior), fix the test
      6. Run `make test` to verify your fix
      7. Commit and push

Example 3: Open-source maintainer β€” auto-fix contributor PRs

apiVersion: kelos.dev/v1alpha1
kind: TaskSpawner
metadata:
  name: contributor-ci-helper
spec:
  when:
    githubPullRequests:
      checkConclusion: failure
      excludeLabels: ["do-not-autofix"]
      draft: false
      reporting:
        enabled: true
  maxConcurrency: 2
  taskTemplate:
    type: claude-code
    model: sonnet
    credentials:
      type: oauth
      secretRef:
        name: claude-credentials
    workspaceRef:
      name: my-workspace
    branch: "{{.Branch}}"
    ttlSecondsAfterFinished: 3600
    promptTemplate: |
      A contributor's PR #{{.Number}} ("{{.Title}}") has failing CI.

      Failed checks:
      {{.FailedChecks}}

      Help the contributor by fixing the CI failure. Focus only on
      mechanical fixes (formatting, lint, missing imports, test updates
      for API changes). If the failure requires design decisions or
      significant code changes, leave a comment explaining the issue
      instead of attempting a fix.

Interaction with Existing and Proposed Features

Feature How it works with CI-failure remediation
maxConcurrency (implemented) Prevents spawning too many fix agents at once
Branch locking (implemented) Ensures only one fix agent per branch
ttlSecondsAfterFinished (implemented) Cleans up completed fix tasks
reporting (implemented) Posts progress comments on the PR
priorityLabels (implemented) Prioritize fixing high-priority PRs first
retriggerOnPush (#752) Would re-engage agent when CI fails again after a fix attempt
filePatterns (#778) Could combine: only trigger when changed files match patterns AND CI fails
costBudget (#788) Limits spending on CI fix attempts
retryStrategy (#730) Could retry with a stronger model if fix attempt fails
checkConclusion (this proposal) The core enabler β€” gates discovery on CI outcome

Why This Is a Growth Opportunity

  1. Broad appeal: Every team with CI/CD pipelines is a potential user. CI failure is the single most common developer friction point.
  2. Easy to demo: "Watch this β€” I push broken code, and Kelos fixes it automatically" is a compelling demo.
  3. Incremental adoption: Teams can start by auto-fixing lint/format issues (low risk), then expand to test failures (medium risk). No big-bang adoption needed.
  4. Differentiator: Most AI coding tools operate at the IDE level. CI-failure remediation operates at the pipeline level β€” a space where Kelos's Kubernetes-native, event-driven architecture is a natural fit.
  5. Complements existing use cases: Can be combined with the PR review spawner (#kelos-reviewer) and dependency upgrade validation (New use case: Intelligent dependency upgrade validation with Dependabot/Renovate PR agent workflowsΒ #722) for a comprehensive PR automation story.

/kind feature

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions