feat(kubernetes): add agent execution mode, executor-agent, sidecar refactor, GKE Sandbox, image & deps upgrades by gafda · Pull Request #33 · aron-muon/KubeCodeRun

gafda · 2026-02-18T17:51:32Z

This pull request updates the development and runtime environments for multiple languages, modernizes package dependencies, and enhances configuration flexibility, particularly for Redis and Kubernetes. The changes include updates to Dockerfiles for newer base images and language versions, significant dependency version bumps and additions, and expanded sample configuration for advanced deployment scenarios.

Key changes:

1. Dependency and Environment Updates

Dockerfiles for C/C++, D, Fortran, and R: Switched to trixie-debian13-dev base images to ensure compilers and development libraries are available at runtime, improving compatibility and security. (docker/c-cpp.Dockerfile, docker/d.Dockerfile, docker/fortran.Dockerfile, docker/r.Dockerfile) [1] [2] [3] [4] [5]
Go Environment: Upgraded Go from 1.25 to 1.26, updated the Go Dockerfile stages, and refreshed Go module dependencies with new and reorganized packages. (docker/go.Dockerfile, docker/requirements/go.mod) [1] [2] [3] [4]
PHP: Bumped PHP version from 8.4.17 to 8.5.3 for the latest features and security patches. (docker/php.Dockerfile)
Python: Modernized and reorganized Python package requirements across core, analysis, documents, utilities, and visualization, with many version bumps and new packages for better functionality and compatibility. (docker/requirements/python-core.txt, docker/requirements/python-analysis.txt, docker/requirements/python-documents.txt, docker/requirements/python-utilities.txt, docker/requirements/python-visualization.txt) [1] [2] [3] [4] [5]
Node.js: Expanded and reorganized the list of global packages for a more comprehensive JavaScript environment. (docker/requirements/nodejs.txt)
Rust: Updated Rust version from 1.92 to 1.93 and refreshed dependencies in Cargo.toml for new features and bug fixes. (docker/rust.Dockerfile, docker/requirements/rust-Cargo.toml) [1] [2] [3]

2. Configuration Enhancements

Redis Configuration: The .env.example file now documents advanced Redis deployment options, including cluster and sentinel modes, TLS/SSL settings, and key prefixing, making it easier to configure Redis in complex environments. [1] [2]
Kubernetes Execution: Added detailed Kubernetes execution configuration options to .env.example, including support for agent and nsenter modes, sidecar image selection, image pull policies, and GKE Sandbox compatibility notes.

Mode	Description
`agent` (default)	Executor agent runs inside the main container. No nsenter, no extra capabilities. Compatible with gVisor/GKE Sandbox.
`nsenter` (legacy)	Sidecar uses nsenter to enter the main container namespace. Requires SYS_PTRACE, SYS_ADMIN, SYS_CHROOT capabilities.

3. Docker Image Naming and CI/CD

Sidecar Image Naming: Updated GitHub Actions workflow to use the new sidecar image name kubecoderun-sidecar-agent instead of kubecoderun-sidecar, aligning with the agent-based execution model. (.github/workflows/docker-publish.yml) [1] [2]

4. Python Runtime Optimization

Python Dockerfile: Cleaned up and reorganized installed runtime libraries, focusing on core utilities, image processing, XML/HTML processing, cryptography, and font support. Removed unnecessary tools from the final image for a leaner runtime. (docker/python.Dockerfile) [1] [2]

These changes collectively modernize the development environments, improve documentation and configuration for advanced deployments, and ensure compatibility with newer language and library versions.

- Add configuration options for GKE Sandbox in KubernetesConfig - Update pod/job manifest creation to support: * runtimeClassName for gVisor runtime * sandbox.gke.io/runtime annotation * nodeSelector for sandbox-enabled nodes * tolerations for GKE sandbox and custom taints - Add GKE Sandbox settings to Helm values.yaml and configmap - Update KubernetesManager, PodSpec, and PoolConfig models - Parse JSON configuration for node selectors and tolerations - Enable easy activation/deactivation via configuration flags GKE Sandbox provides additional kernel isolation using gVisor for untrusted workloads. When enabled, execution pods will: - Run with gVisor runtime (runtimeClassName: gvisor) - Be scheduled on sandbox-enabled nodes - Tolerate GKE sandbox taints automatically - Support custom node pool taints for dedicated execution nodes Configuration example in values.yaml: execution: gkeSandbox: enabled: true runtimeClassName: gvisor nodeSelector: {} customTolerations: - key: pool operator: Equal value: sandbox effect: NoSchedule

* Introduce to support agent and nsenter modes. * Implement agent mode with a lightweight executor agent running in the main container. * Add for configuring the executor agent's HTTP server port. * Enhance security by dropping all capabilities in agent mode and ensuring no privilege escalation. * Support image pull secrets for private registries via . * Update documentation to reflect new execution modes and security configurations. * Modify Helm chart to include image pull secrets configuration.

* Change default sidecar image from to . * Update environment variable names for executor port from to . * Add documentation for building sidecar images and configuring Helm charts for execution modes. * Introduce GKE Sandbox support with configuration details and limitations. * Update related code and tests to reflect changes in image names and environment variables.

- Updated Dockerfiles for C/C++, D, Fortran, R, and Sidecar to use the trixie-dev variant. - Ensures compilers and development libraries are available at runtime.

* Updated base images to trixie-debian13-dev for C/C++, D, Fortran, R, and Rust. * Upgraded PHP version to 8.5.3. * Enhanced Node.js and Python requirements with new packages and versions. * Improved Rust dependencies for better compatibility and performance. * Updated Go version in executor-agent to 1.26.

* Introduced K8S_IMAGE_PULL_POLICY and K8S_IMAGE_PULL_SECRETS in configuration. * Updated relevant classes and methods to handle new fields. * Enhanced validation for execution mode and sidecar image consistency. * Added unit tests to ensure correct handling of image pull settings.

- Add REDIS_MODE (standalone/cluster/sentinel) to RedisConfig - Add TLS/SSL configuration (REDIS_TLS_ENABLED, certs, CA, insecure) - Add Redis Cluster support (REDIS_CLUSTER_NODES) via RedisCluster client - Add Redis Sentinel support (REDIS_SENTINEL_NODES/MASTER/PASSWORD) - Update RedisPool to support all three modes with TLS - Migrate FileService to use shared RedisPool instead of standalone client - Update Settings class with all new Redis fields - Update .env.example with new Redis configuration options - Update docs/CONFIGURATION.md with Cluster, Sentinel, and TLS sections - Update docs/SECURITY.md with TLS configuration reference - Update Helm values.yaml, configmap.yaml, and _helpers.tpl - Default remains standalone Redis for full backward compatibility feat: add optional Redis key prefix support (REDIS_KEY_PREFIX) - Add key_prefix field to RedisConfig and Settings - Add make_key() helper to RedisPool for centralized key prefixing - Update all services to use prefixed keys: session, state, file, health, api_key_manager, detailed_metrics, metrics - Update .env.example, docs, and Helm chart with new setting

Copilot

Pull request overview

This pull request introduces a significant architectural enhancement by adding an agent-based execution mode as the default, alongside comprehensive Redis deployment mode support and extensive dependency upgrades. The changes modernize the security model by eliminating the need for Linux capabilities in the default execution path, enable GKE Sandbox (gVisor) compatibility, and provide flexibility for Redis clustering and high-availability deployments.

Changes:

Introduced agent execution mode (default) that eliminates nsenter, Linux capabilities, and privilege escalation requirements, with nsenter mode retained for backward compatibility
Added comprehensive Redis deployment modes (standalone, cluster, sentinel) with TLS/SSL support and optional key prefixing for multi-tenant deployments
Implemented GKE Sandbox (gVisor) support with runtime class, node selectors, and tolerations for kernel-level isolation
Upgraded language runtimes and dependencies: Go 1.25→1.26, PHP 8.4.17→8.5.3, Rust 1.92→1.93, Python packages modernized
Refactored sidecar to multi-target Docker build producing both agent and nsenter variants from a single Dockerfile

Reviewed changes

Copilot reviewed 49 out of 50 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/services/kubernetes/client.py`	Enhanced pod manifest creation with agent/nsenter mode support, GKE Sandbox configuration, and image pull secrets
`docker/sidecar/main.py`	Added `execute_via_agent()` function and mode routing logic for dual execution mode support
`docker/sidecar/executor-agent/main.go`	New Go HTTP server for agent mode execution without nsenter or capabilities
`docker/sidecar/Dockerfile`	Refactored to multi-target build: `sidecar-agent` (default) and `sidecar-nsenter` (legacy)
`src/core/pool.py`	Complete rewrite supporting Redis standalone/cluster/sentinel modes with TLS and key prefixing
`src/config/redis.py`	New configuration model for Redis deployment modes, TLS, and advanced features
`src/services/*.py`	Updated all Redis-using services (state, session, metrics, api_key_manager) to use key prefixing
`src/main.py`	Added validation for execution mode/sidecar image consistency and image pull secrets parsing
`tests/unit/test_kubernetes_client.py`	Comprehensive tests for agent/nsenter modes and GKE Sandbox configuration
`scripts/build-images.sh`	Enhanced to support Docker multi-target builds with `--target` flag
`.github/workflows/docker-publish.yml`	Updated sidecar image name to `kubecoderun-sidecar-agent`
`helm-deployments/kubecoderun/values.yaml`	Added Redis mode configuration, GKE Sandbox settings, and execution mode options
`docs/SECURITY.md`, `docs/CONFIGURATION.md`, `docs/ARCHITECTURE.md`	Extensive documentation updates for new execution modes and Redis features

Comments suppressed due to low confidence (1)

.github/workflows/docker-publish.yml:153

The CI/CD workflow only builds the kubecoderun-sidecar-agent image but not the kubecoderun-sidecar-nsenter variant. While the build script supports both targets and the Dockerfile defines both, the nsenter sidecar won't be available in the registry for users who need legacy nsenter mode. Consider adding a separate job to build the nsenter variant, or document that users must build it locally if needed.

  sidecar:
    needs: changes
    if: |
      needs.changes.outputs.is_cross_repo_pr != 'true' &&
      (needs.changes.outputs.sidecar == 'true' || needs.changes.outputs.force_all == 'true')
    uses: ./.github/workflows/docker-build-reusable.yml
    secrets: inherit
    with:
      image_name: kubecoderun-sidecar-agent
      dockerfile: docker/sidecar/Dockerfile
      context: docker/sidecar
      image_tag: ${{ needs.changes.outputs.image_tag }}
      is_release: ${{ needs.changes.outputs.is_release == 'true' }}
      version: ${{ needs.changes.outputs.version }}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/main.py

docker/sidecar/executor-agent/main.go

tests/unit/test_kubernetes_client.py

docker/sidecar/Dockerfile

docs/SECURITY.md

* Validate GKE Sandbox compatibility with nsenter execution mode * Log warnings when incompatible execution modes are used * Update Dockerfile capabilities for nsenter * Enhance RedisPool type hints for better type checking * Add unit tests for GKE Sandbox and nsenter mode interactions

…me by default Three issues prevented connecting to GCP Memorystore Redis with TLS: 1. _validate_redis_connection() used redis.from_url() without passing ssl_ca_certs / ssl_cert_reqs, so certificate verification fell back to the system CA bundle which doesn't include managed-service CAs. 2. get_tls_kwargs() set ssl_check_hostname=True when tls_insecure=False. Managed Redis services (GCP Memorystore, AWS ElastiCache) and Redis Cluster node discovery return IPs that don't match certificate CN/SAN, causing CERTIFICATE_VERIFY_FAILED. Hostname checking is now off by default (matching redis-py) and controlled by the new REDIS_TLS_CHECK_HOSTNAME setting. 3. REDIS_HOST could contain a URL scheme (rediss://host) which was passed through to ClusterNode or URL construction. A field validator now strips accidental schemes from the host value.

gafda · 2026-02-26T18:14:06Z

@aron-muon

aron-muon · 2026-02-26T18:22:50Z

@aron-muon

Hello, let me take a look here. Nice to see that Nos engineers are using Kubecoderun - I am a Nos customer myself.

Copilot

Pull request overview

Copilot reviewed 50 out of 51 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/services/kubernetes/client.py

src/config/__init__.py

helm-deployments/kubecoderun/values.yaml

src/config/kubernetes.py

docker/sidecar/executor-agent/main.go

docs/CONFIGURATION.md

.env.example

…host fallback The startup config validator always used redis.from_url() with a standalone-style URL regardless of REDIS_MODE. In cluster or sentinel mode this connected to the wrong host (typically localhost:6379) and failed, blocking startup. - Rewrite _validate_redis_connection() to build the correct client type per mode: RedisCluster for cluster, Sentinel for sentinel, and redis.from_url() for standalone — all with proper TLS kwargs. - Remove the silent localhost:6379 fallback in RedisPool._initialize() that masked real connection errors and caused confusing log messages. - Update the corresponding unit test to expect the error to propagate.

- Shallow-copy annotations dict to prevent mutation by GKE Sandbox - Validate 'key' field in custom tolerations, skip invalid with warning - Log warning on invalid JSON for GKE_SANDBOX_NODE_SELECTOR/TOLERATIONS - Align image_pull_policy default to 'Always' in KubernetesConfig - Fix path traversal in executor-agent workdir validation (/mnt/data2) - Replace env override append with key replacement in executor-agent - Default gkeSandbox.enabled to false in Helm values - Add security warning for REDIS_TLS_CHECK_HOSTNAME in docs and .env - Add unit tests for all fixes (7 new tests)

…tests - Fix empty REDIS_PASSWORD sending AUTH by converting to None - Fix empty REDIS_CLUSTER_NODES treated as truthy, falling back to host:port - Add missing REDIS_HOST/REDIS_PORT/REDIS_DB to Helm configmap - Add REDIS_PASSWORD to Helm secret for cluster/sentinel modes - Add 6-node Redis Cluster docker-compose (non-TLS and TLS variants) - Add TLS cert generation/cleanup scripts for local testing - Add 11 non-TLS + 14 TLS cluster integration tests (RedisPool, ConfigValidator, sync/async clients, key prefix operations) - Add 20 new unit tests for Settings/RedisConfig validators

All Redis pipelines that operate on keys in different hash slots now use transaction=False instead of transaction=True. Redis Cluster cannot wrap MULTI/EXEC around keys on different nodes. ClusterPipeline with transaction=False still batches commands but splits them by node. Fixed files: - session.py: create_session(), delete_session() - api_key_manager.py: _ensure_single_env_key_record(), create_key(), revoke_key() - state.py: save_state() Also fixes version display showing '0.0.0.dev0' in production: - build-images.sh: pass --build-arg VERSION=$TAG to docker build - config: add SERVICE_VERSION env var for runtime version override - main.py + logging.py: prefer SERVICE_VERSION over build-time _version.py Added 8 unit tests (test_cluster_pipeline_compat.py) verifying: - All 6 pipelines use transaction=False - SERVICE_VERSION override and fallback behavior Tested against standalone Redis, Cluster (no TLS), and Cluster (TLS). All 1352 unit tests + 178 integration tests pass.

gafda · 2026-02-27T18:38:32Z

@aron-muon
Made a few more fixes, mostly around REDIS in CLUSTER mode.

Copilot

Pull request overview

Copilot reviewed 61 out of 62 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/services/kubernetes/client.py

src/services/session.py

tests/tls-certs/generate.sh

- Pass --port to executor-agent binary so configured executor_port is actually used (fixes agent-mode when port != 9090) - Remove incorrect await on redis.pipeline() in SessionService to align with redis-py asyncio API (pipeline() is synchronous) - Restrict CA private key to 600 in TLS cert generator (redis.key stays 644 for container access)

aron-muon

Could you separate out the dockerfile changes (e.g. updating the python/go Dockerfiles and dependencies) from this PR? I'd like such a large change to be as isolated as possible

aron-muon · 2026-03-02T11:06:49Z

.github/workflows/docker-publish.yml

    secrets: inherit
    with:
-      image_name: kubecoderun-sidecar
+      image_name: kubecoderun-sidecar-agent


It appears you are no longer building sidecar-nsenter?

aron-muon · 2026-03-02T11:20:27Z

tests/tls-certs/cleanup.sh

@@ -0,0 +1,12 @@
+#!/usr/bin/env bash


I don't see these scripts being used anywhere in these changes - can we remove them?

aron-muon · 2026-03-02T11:22:58Z

tests/integration/test_redis_cluster_tls.py

@@ -0,0 +1,454 @@
+"""Integration tests for Redis Cluster with TLS.
+
+Mirrors the user's production GCP Memorystore configuration:


don't need this comment indicating an AI agent is referring to your setup

gafda · 2026-03-02T11:35:31Z

Could you separate out the dockerfile changes (e.g. updating the python/go Dockerfiles and dependencies) from this PR? I'd like such a large change to be as isolated as possible

So to confirm — do you want me to move all changes under the docker folder tree into a separate PR, leaving this PR focused on the agent execution mode, executor-agent/sidecar refactor, GKE Sandbox support, and Redis Cluster/Sentinel/TLS changes? Did I understood it well?

aron-muon · 2026-03-02T11:50:53Z

Could you separate out the dockerfile changes (e.g. updating the python/go Dockerfiles and dependencies) from this PR? I'd like such a large change to be as isolated as possible

So to confirm — do you want me to move all changes under the docker folder tree into a separate PR, leaving this PR focused on the agent execution mode, executor-agent/sidecar refactor, GKE Sandbox support, and Redis Cluster/Sentinel/TLS changes? Did I understood it well?

Yes - if you have ideas of making the split even larger based on logical silos, beyond just 2 PRs, that could be helpful as well. I am reviewing 4000+ lines of code manually (as well with an AI agent, but I do the review manually to make sure that the changes support an architectural vision with the project).

Also - don't forget to run linters/tests etc using the justfile https://github.com/aron-muon/KubeCodeRun/blob/main/justfile#L12C1-L28C56
e.g. just lint

gafda · 2026-03-02T16:24:56Z

Hi @aron-muon,

Thank you for your feedback on this PR. Per our discussion about breaking this down into smaller chunks, I am closing PR #33.

I have split the functionality into the following three new pull requests. All feedback you previously provided has been incorporated.

PR #35 - task-docker-image-deps-upgrade: Focuses on upgrading all language runtimes to DHI base images and bumping dependency versions (Go 1.26, PHP 8.5.3, Rust 1.93, etc.).
PR #36 - feat-agent-execution-mode: Introduces agent-based execution, sidecar builds, GKE Sandbox support, and CI for both sidecar variants.
PR #37 - feat-redis-cluster-sentinel-tls: Adds support for Redis Cluster, Sentinel, TLS/SSL, key prefixing, and related integration test environments.

Important Note: This work was developed as a single, cohesive feature set. As such, these three PRs are interdependent and are intended to be reviewed and merged together to ensure full functionality.

Looking forward to your review of the new PRs.

gafda added 7 commits February 12, 2026 18:39

feat(docker): upgrade base image to trixie-dev

317db84

- Updated Dockerfiles for C/C++, D, Fortran, R, and Sidecar to use the trixie-dev variant. - Ensures compilers and development libraries are available at runtime.

gafda marked this pull request as ready for review February 23, 2026 10:16

gafda requested a review from aron-muon as a code owner February 23, 2026 10:16

Copilot AI review requested due to automatic review settings February 23, 2026 10:16

Copilot started reviewing on behalf of gafda February 23, 2026 10:16 View session

Copilot AI reviewed Feb 23, 2026

View reviewed changes

src/main.py Show resolved Hide resolved

docker/sidecar/executor-agent/main.go Show resolved Hide resolved

tests/unit/test_kubernetes_client.py Show resolved Hide resolved

docker/sidecar/Dockerfile Outdated Show resolved Hide resolved

docs/SECURITY.md Outdated Show resolved Hide resolved

gafda added 2 commits February 23, 2026 13:22

Copilot AI review requested due to automatic review settings February 26, 2026 18:13

Copilot started reviewing on behalf of gafda February 26, 2026 18:14 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

gafda added 4 commits February 26, 2026 18:33

Copilot AI review requested due to automatic review settings February 27, 2026 18:37

Copilot started reviewing on behalf of gafda February 27, 2026 18:38 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

src/services/kubernetes/client.py Show resolved Hide resolved

src/services/session.py Show resolved Hide resolved

tests/tls-certs/generate.sh Show resolved Hide resolved

aron-muon requested changes Mar 2, 2026

View reviewed changes

gafda closed this Mar 2, 2026

This was referenced Mar 2, 2026

feat(docker): upgrade base images and dependencies to DHI trixie #35

Merged

feat(kubernetes): add agent execution mode, GKE Sandbox, and image pu… #36

Open

feat(redis): add Cluster, Sentinel, and TLS/SSL support #37

Open

		@@ -0,0 +1,454 @@
		"""Integration tests for Redis Cluster with TLS.

		Mirrors the user's production GCP Memorystore configuration:

Conversation

gafda commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Dependency and Environment Updates

2. Configuration Enhancements

3. Docker Image Naming and CI/CD

4. Python Runtime Optimization

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gafda commented Feb 26, 2026

Uh oh!

aron-muon commented Feb 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gafda commented Feb 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aron-muon left a comment

Choose a reason for hiding this comment

Uh oh!

aron-muon Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

aron-muon Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

aron-muon Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

gafda commented Mar 2, 2026

Uh oh!

aron-muon commented Mar 2, 2026

Uh oh!

gafda commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gafda commented Feb 18, 2026 •

edited

Loading