Skip to content

feat(ci): allow for supporting multi-arch images to be built and shipped#787

Open
knechtionscoding wants to merge 43 commits intokelos-dev:mainfrom
datagravity-ai:feat/multi-arch-images-upstream
Open

feat(ci): allow for supporting multi-arch images to be built and shipped#787
knechtionscoding wants to merge 43 commits intokelos-dev:mainfrom
datagravity-ai:feat/multi-arch-images-upstream

Conversation

@knechtionscoding
Copy link
Contributor

@knechtionscoding knechtionscoding commented Mar 24, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

We want to run Kelos on our ARM nodes as well as AMD. Seeing as this is primarily an interface with k8s and llm APIs and not doing ML work locally this is relatively easy to allow.

Updates the dockerfiles to build the binary as a multi-stage build and then publish all the images properly

Which issue(s) this PR is related to:

N/A

Special notes for your reviewer:

Does this PR introduce a user-facing change?

feat(ci): support running kelos on ARM 

Summary by cubic

Adds multi-arch Docker images and switches defaults to public ECR, plus webhook-based task discovery (GitHub and Linear) with a new CRD and receiver. Also includes Bedrock credentials with IRSA support and Helm/CI updates.

  • New Features

    • Multi-arch builds via docker buildx and make push-multiarch (linux/amd64, linux/arm64) with cache.
    • Default image registry moved to public.ecr.aws/anomalo/kelos (Makefile, defaults, Helm values, CI).
    • Webhook system: WebhookEvent CRD, kelos-webhook-receiver service (HMAC validation), and new TaskSpawner triggers when.githubWebhook and when.linearWebhook.
    • Linear webhook support with state/label filters; GitHub webhook support with label filters.
    • Agent and binaries now built in multi-stage Dockerfiles for reproducible cross-arch images.
    • New bedrock and none credential types; IRSA support for Bedrock (region and SA fields), centralized env var injection.
    • Dev tooling: Codespaces .devcontainer, pre-commit secrets check hook.
    • Docs and examples for webhooks (GitHub/Linear) and Helm chart templates/RBAC for the receiver.
  • Migration

    • Update deployments/Helm values to use public.ecr.aws/anomalo/kelos/* images.
    • Enable the webhook receiver in Helm (.Values.webhook.enabled=true), set GITHUB_WEBHOOK_SECRET / LINEAR_WEBHOOK_SECRET, and expose endpoints per docs.
    • Use when.githubWebhook or when.linearWebhook in TaskSpawner specs as needed.
    • For Bedrock via IRSA: set credentials type bedrock, region, and the service account; omit secretRef.

Written for commit 9ba75ee. Summary will update on new commits.

knechtionscoding and others added 20 commits March 24, 2026 17:06
Add a new `bedrock` credential type that injects AWS environment
variables (CLAUDE_CODE_USE_BEDROCK, AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY, AWS_REGION) from a referenced Secret, with
optional support for AWS_SESSION_TOKEN and ANTHROPIC_BEDROCK_BASE_URL.

Refactor credential injection into a centralized credentialEnvVars()
function so that adding future providers (e.g. Vertex) requires only
a new case block.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Make Credentials.SecretRef a pointer (*SecretReference) so it can be
omitted for bedrock credentials using IAM Roles for Service Accounts.
Add Region and ServiceAccountName fields to Credentials for IRSA mode.

CEL validation ensures secretRef remains required for api-key and oauth
credential types. In IRSA mode, only CLAUDE_CODE_USE_BEDROCK=1 and
AWS_REGION are injected — the AWS SDK handles auth via the projected
service account token.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: add bedrock credential type for AWS Bedrock authentication
feat: swap from ghcr to ecr so we can cleanly run this in our own env
Added installation for Claude Code and a helper function for AWS Bedrock in the Dockerfile.
chore: run verify and update prpoerly and format
feat: add first-class IRSA support for bedrock credentials
fix: push prod rather than main on each branch
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 9 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cursor/Dockerfile">

<violation number="1" location="cursor/Dockerfile:1">
P2: Builder toolchain is decoupled from `GO_VERSION` and uses a floating Go tag, reducing build reproducibility and risking version drift.</violation>
</file>

<file name="cmd/kelos-spawner/Dockerfile">

<violation number="1" location="cmd/kelos-spawner/Dockerfile:1">
P2: New builder base image uses a floating tag (`golang:1.25`), which can cause non-reproducible builds and external version drift.</violation>
</file>

<file name="gemini/Dockerfile">

<violation number="1" location="gemini/Dockerfile:1">
P2: Builder stage uses a floating Go image tag, making release artifacts non-reproducible across rebuilds.</violation>
</file>

<file name="claude-code/Dockerfile">

<violation number="1" location="claude-code/Dockerfile:1">
P2: New builder stage uses an unpinned `golang` image tag, making shipped binary builds non-deterministic and vulnerable to upstream image drift.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

tmarshall and others added 13 commits March 25, 2026 13:19
Add devcontainer configuration for Codespaces
Removed Oh My Zsh installation from Dockerfile.
Add environment variables for Teleport authentication and proxy.
Fix quirks with the codespace dockerfile
Add remoteUser configuration to devcontainer.json
feat(ci): support multi-arch images by building in container
* feat(ci): add pre-commit hook that checks for secrets

* fix: more keys

* chore: format
* session: add webhook support plan

* feat: add WebhookEvent CRD type definition

Defines WebhookEvent custom resource for storing incoming webhooks.
Uses CRD-based queue pattern consistent with Kelos architecture.

Note: Requires 'make update' in Go environment to generate deepcopy
and CRD manifests.

* feat: add webhook receiver HTTP server

Implements HTTP endpoint at /webhook/:source that:
- Receives webhook payloads
- Validates GitHub signatures via HMAC-SHA256
- Creates WebhookEvent CRD instances
- Returns 202 Accepted

Supports multiple sources (github, slack, linear, etc.) via URL path.

* feat: add GitHub webhook source implementation

Implements Source interface for webhook-based discovery:
- Reads unprocessed WebhookEvent CRDs with source=github
- Parses GitHub issue and pull_request webhook payloads
- Converts to WorkItem format
- Marks events as processed after discovery
- Supports label-based filtering (client-side)

* feat: add GitHubWebhook to TaskSpawner When options

Adds githubWebhook field to TaskSpawner CRD allowing webhook-based
discovery as an alternative to API polling. The spawner will watch
WebhookEvent resources in the specified namespace and convert GitHub
webhook payloads to tasks.

* feat: integrate GitHubWebhookSource into spawner

Updates spawner to create GitHubWebhookSource instances when
TaskSpawner has githubWebhook configured. Passes k8s client to
source so it can list and update WebhookEvent resources.

* docs: add webhook documentation and examples

Includes:
- Complete TaskSpawner example using githubWebhook
- Webhook receiver deployment manifests
- Architecture documentation explaining CRD-based queue
- GitHub webhook setup instructions
- Comparison with API polling approach

* session: update plan with completion status

Marks completed phases and notes what still needs to be done:
- CRD manifest generation (needs Go environment)
- Unit and integration tests
- Live cluster testing

* test: add unit tests for webhook receiver and source

Webhook receiver tests:
- Valid/invalid GitHub signature validation
- Missing signature header handling
- No secret configured (dev mode)
- HTTP method validation
- Missing source in path

GitHub webhook source tests:
- Issue payload parsing
- Pull request payload parsing
- Closed issues/PRs are skipped
- Label filtering (required, excluded, multiple)
- End-to-end Discover with fake k8s client
- Events marked as processed after discovery

* session: mark unit tests as complete in plan

* fix: resolve test compilation errors

- Fix logger type in webhook receiver (logr.Logger)
- Add missing parameters to buildSource calls in tests
- Remove field selectors for fake client compatibility

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: correct test expectations

- Fix HMAC-SHA256 signature in webhook receiver test
- Enable status subresource in fake client for webhook source test

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* make update

* feat: add RBAC permissions for WebhookEvent resources

- Add webhookevents permissions to kelos-spawner-role
- Create kelos-webhook-receiver-role for webhook receiver

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* docs: focus webhook documentation on GitHub support

- Remove mentions of future sources (Slack, Linear, etc.)
- Remove comparison table with API polling
- Keep documentation focused on current implementation

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: add integration tests for webhook flow

- Test WebhookEvent discovery and processing
- Test label filtering (required and excluded labels)
- Test issue and pull request payloads
- Test source filtering (github vs other sources)
- Test skipping closed issues/PRs
- Verify events are marked as processed after discovery

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* docs: update webhook support plan

Mark all tasks as completed:
- RBAC permissions added
- Integration tests implemented
- All tests passing

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* session: remove webhook support plan

* fix: use DeepCopy for status updates in webhook discovery

- Change from &eventList.Items[i] to DeepCopy() to fix status updates
- Mark events as processed even when filtered out or invalid
- Ensures all processed events have status updated in integration tests

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: add helm chart support for webhooks

* fix: add webhook receiver docker

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Hans Knecht <hans@anomalo.com>
* session: add webhook support plan

* feat: add WebhookEvent CRD type definition

Defines WebhookEvent custom resource for storing incoming webhooks.
Uses CRD-based queue pattern consistent with Kelos architecture.

Note: Requires 'make update' in Go environment to generate deepcopy
and CRD manifests.

* feat: add webhook receiver HTTP server

Implements HTTP endpoint at /webhook/:source that:
- Receives webhook payloads
- Validates GitHub signatures via HMAC-SHA256
- Creates WebhookEvent CRD instances
- Returns 202 Accepted

Supports multiple sources (github, slack, linear, etc.) via URL path.

* feat: add GitHub webhook source implementation

Implements Source interface for webhook-based discovery:
- Reads unprocessed WebhookEvent CRDs with source=github
- Parses GitHub issue and pull_request webhook payloads
- Converts to WorkItem format
- Marks events as processed after discovery
- Supports label-based filtering (client-side)

* feat: add GitHubWebhook to TaskSpawner When options

Adds githubWebhook field to TaskSpawner CRD allowing webhook-based
discovery as an alternative to API polling. The spawner will watch
WebhookEvent resources in the specified namespace and convert GitHub
webhook payloads to tasks.

* feat: integrate GitHubWebhookSource into spawner

Updates spawner to create GitHubWebhookSource instances when
TaskSpawner has githubWebhook configured. Passes k8s client to
source so it can list and update WebhookEvent resources.

* docs: add webhook documentation and examples

Includes:
- Complete TaskSpawner example using githubWebhook
- Webhook receiver deployment manifests
- Architecture documentation explaining CRD-based queue
- GitHub webhook setup instructions
- Comparison with API polling approach

* session: update plan with completion status

Marks completed phases and notes what still needs to be done:
- CRD manifest generation (needs Go environment)
- Unit and integration tests
- Live cluster testing

* test: add unit tests for webhook receiver and source

Webhook receiver tests:
- Valid/invalid GitHub signature validation
- Missing signature header handling
- No secret configured (dev mode)
- HTTP method validation
- Missing source in path

GitHub webhook source tests:
- Issue payload parsing
- Pull request payload parsing
- Closed issues/PRs are skipped
- Label filtering (required, excluded, multiple)
- End-to-end Discover with fake k8s client
- Events marked as processed after discovery

* session: mark unit tests as complete in plan

* fix: resolve test compilation errors

- Fix logger type in webhook receiver (logr.Logger)
- Add missing parameters to buildSource calls in tests
- Remove field selectors for fake client compatibility

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: correct test expectations

- Fix HMAC-SHA256 signature in webhook receiver test
- Enable status subresource in fake client for webhook source test

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* make update

* feat: add RBAC permissions for WebhookEvent resources

- Add webhookevents permissions to kelos-spawner-role
- Create kelos-webhook-receiver-role for webhook receiver

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* docs: focus webhook documentation on GitHub support

- Remove mentions of future sources (Slack, Linear, etc.)
- Remove comparison table with API polling
- Keep documentation focused on current implementation

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: add integration tests for webhook flow

- Test WebhookEvent discovery and processing
- Test label filtering (required and excluded labels)
- Test issue and pull request payloads
- Test source filtering (github vs other sources)
- Test skipping closed issues/PRs
- Verify events are marked as processed after discovery

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* docs: update webhook support plan

Mark all tasks as completed:
- RBAC permissions added
- Integration tests implemented
- All tests passing

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* session: remove webhook support plan

* fix: use DeepCopy for status updates in webhook discovery

- Change from &eventList.Items[i] to DeepCopy() to fix status updates
- Mark events as processed even when filtered out or invalid
- Ensures all processed events have status updated in integration tests

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* session: add Linear webhook support plan

* feat: add Linear webhook signature validation

- Add validateLinearSignature() function with HMAC-SHA256
- Optional validation via LINEAR_WEBHOOK_SECRET env var
- Validate X-Linear-Signature header (no sha256= prefix)
- Add unit tests for valid/invalid/missing signatures

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: implement LinearWebhookSource for webhook discovery

- Add LinearWebhookSource implementing Source interface
- Parse Linear Issue webhooks (create/update actions)
- Filter by states, labels, and excludeLabels
- Exclude terminal states (completed/canceled) by default
- Use DeepCopy pattern for status updates
- Add comprehensive unit tests

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: add LinearWebhook type to TaskSpawner CRD

- Add LinearWebhook struct with namespace, states, labels filters
- Add When.LinearWebhook field for webhook-based Linear discovery
- Mirror GitHub webhook pattern for consistency

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* feat: add LinearWebhook support to spawner

- Extend buildSource() to create LinearWebhookSource
- Pass k8s client, namespace, and filters to source

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* test: add integration tests for Linear webhook flow

* docs: add Linear webhook documentation and example configuration

* session: note Go environment requirements for Phase 7

* session: remove pr-session plan

* chore: make update

* fix; passing tests

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Hans Knecht <hans@anomalo.com>
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cmd/kelos-token-refresher/Dockerfile">

<violation number="1" location="cmd/kelos-token-refresher/Dockerfile:10">
P2: The Dockerfile now builds `kelos-token-refresher` twice: once via `go build` and again via `make build`, which overwrites the same output. This makes the new build step redundant and risks different build flags being used in the final binary.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ knechtionscoding
❌ tmarshall
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants