feat: Python version-aware Docker image selection#261
Conversation
Auto-detect user's Python version at build/deploy time and select the matching runtime image. Supports 3.10-3.12 (3.13+ not stable). SDK changes: - Add get_image_name() for versioned tag resolution (py3.11-latest) - Add python_version field to ServerlessResource (in _hashed_fields) - Wire LiveServerless/LiveLoadBalancer to use get_image_name() - Add build-time validation with actionable error for unsupported versions - Record python_version in flash_manifest.json - Narrow pyproject.toml to >=3.10,<3.13 GPU images require Python 3.11+ (no PyTorch CUDA 12.8 image for 3.10). CPU images support 3.10-3.12. Env var overrides (FLASH_GPU_IMAGE etc.) bypass version logic entirely. Tag format: runpod/flash:py3.11-latest (latest alias kept for py3.12)
There was a problem hiding this comment.
Pull request overview
This PR introduces a python_version concept across serverless resource models and build tooling to support Python-version-aware runtime Docker image selection (motivated by avoiding local build/runtime ABI mismatches).
Changes:
- Added Python version support + validation (
python_versiononServerlessResource, helpers inconstants.py) and switched Live* resources to resolve versioned image names dynamically. - Updated CLI build/manifest generation to record and validate the local Python minor version.
- Expanded/adjusted unit tests to cover image resolution, version validation, and manifest traceability.
Reviewed changes
Copilot reviewed 12 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/runpod_flash/core/resources/constants.py |
Adds supported Python versions and get_image_name()/validate_python_version() helpers; updates default image constants to include pyX.Y. |
src/runpod_flash/core/resources/serverless.py |
Adds python_version field, validation, and includes it in hashed/input-only sets for drift/redeploy behavior. |
src/runpod_flash/core/resources/live_serverless.py |
Live* classes now derive imageName via get_image_name() using python_version or default. |
src/runpod_flash/cli/commands/build.py |
Validates local Python minor version before dependency installation. |
src/runpod_flash/cli/commands/build_utils/manifest.py |
Adds python_version field to the generated manifest for traceability. |
pyproject.toml |
Narrows supported Python range to >=3.10,<3.13. |
tests/unit/core/resources/test_constants.py |
New tests for version support + image name resolution + env var override behavior. |
tests/unit/resources/test_serverless.py |
Adds tests for ServerlessResource.python_version validation and hashing/input-only membership. |
tests/unit/resources/test_live_serverless.py |
Adds tests asserting Live* default/explicit python-version image selection and GPU constraints. |
tests/unit/resources/test_live_load_balancer.py |
Updates expected LiveLoadBalancer image tag format to include py3.11-.... |
tests/unit/cli/commands/build_utils/test_manifest.py |
Ensures manifest includes the current Python minor version. |
tests/unit/test_dotenv_loading.py |
Updates env-var import location for image constants. |
tests/unit/core/__init__.py / tests/unit/core/resources/__init__.py |
Adds package initializers to support test module imports. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
KAJdev
left a comment
There was a problem hiding this comment.
when deploying I'm always getting the default 3.11 image (even when deploying within a 3.12 environment with 3.12 in the manifest)
what i think is happening is when running flash deploy, create_resource_from_manifest doesn't extract python_version from the manifest and pass it to the ServerlessResource constructors, so they default to 3.11.
It also seems like its not exposed on Endpoint so users wouldn't be able to override it if they wanted to (not sure if this is intentional or not, just something to note)
- Move env-var override check before version validation in get_image_name - Add regression tests for override bypassing unsupported versions - Fix python_version field description (no auto-detection at field level) - Fix build error message to recommend switching local Python - Rename mismatched test (test_default_python_version_is_3_11)
runpod-Henrik
left a comment
There was a problem hiding this comment.
Bug: Python version check silently skipped for projects with no dependencies
The version validation only runs when requirements is non-empty — it lives inside install_dependencies(), which is gated behind if requirements: at build.py:312.
A user on Python 3.13 with no requirements.txt and no dependencies=[...] on their @remote decorator will run flash build, see it succeed, deploy their worker, and then hit cryptic cloudpickle deserialization failures at runtime. No warning anywhere. The entire point of this PR is to catch exactly this mismatch, but it misses the most common case — a simple worker with no extra deps.
We can demonstrate this directly: calling install_dependencies(build_dir, [], no_deps=False) with sys.version_info mocked to 3.13 returns True. The same call with ["requests"] returns False. Same Python version, opposite outcomes, depending purely on whether the requirements list is empty.
Fix: call validate_python_version(f"{sys.version_info.major}.{sys.version_info.minor}") unconditionally before the if requirements: block.
Issue: Manifest records the build machine's Python, not the deployed image's Python
manifest.py writes sys.version_info (local Python) as python_version. But if a user explicitly sets python_version="3.12" on their resource config and builds on a 3.11 machine, the manifest records 3.11 while the deployed image is runpod/flash:py3.12-latest.
We can show this concretely: create a LiveServerless with python_version="3.12" (image correctly resolves to py3.12), call ManifestBuilder.build() with sys.version_info patched to 3.11, and the manifest comes back with python_version: "3.11" — not "3.12". The manifest is wrong from the moment it's written.
This is a new field on a new schema — the moment any tooling reads manifest.python_version to understand what's running in production, it gets the wrong answer. Easier to record it correctly now than to migrate it later.
Fix: pass the resource's python_version (or DEFAULT_PYTHON_VERSION if unset) into the manifest builder rather than using sys.version_info.
Nit: Two error messages shown for one failure
When Python version validation fails inside install_dependencies(), the function prints a detailed 3-line explanation and returns False. The caller then prints a second message: "Error: Failed to install dependencies". Calling install_dependencies directly with 3.13 and a non-empty requirements list confirms both: the version error is printed and the function returns False, at which point run_build adds its own redundant banner.
Fix: since install_dependencies() already handles its own error output on version failure, remove the generic "Error: Failed to install dependencies" print from the caller, or raise an exception instead of returning False so the caller can distinguish version errors from pip errors.
🤖 Reviewed by Henrik's AI-Powered Bug Finder
Prevents python_version from leaking into GraphQL payloads for CpuServerlessEndpoint and CpuLoadBalancerSlsResource.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 16 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| @field_validator("python_version") | ||
| @classmethod | ||
| def validate_python_version(cls, v: Optional[str]) -> Optional[str]: | ||
| if v is not None and v not in SUPPORTED_PYTHON_VERSIONS: | ||
| supported = ", ".join(SUPPORTED_PYTHON_VERSIONS) | ||
| raise ValueError( | ||
| f"Python {v} is not supported. Supported versions: {supported}" | ||
| ) | ||
| return v |
There was a problem hiding this comment.
ServerlessResource.validate_python_version() duplicates the validation logic already implemented in runpod_flash.core.resources.constants.validate_python_version(). To avoid the supported-version list and error message drifting over time, consider importing and calling the shared helper here (or otherwise centralizing this validation in one place).
- Move Python version validation before `if requirements:` so no-dep projects are also checked (was silently skipped) - Pass python_version to ManifestBuilder instead of reading sys.version_info at manifest build time - Remove duplicate version check from install_dependencies (now done unconditionally in run_build)
|
@runpod-Henrik All three issues from your review addressed in commit 1c85da4: Bug (version check skipped for no-dep projects): Moved Issue (manifest records wrong Python): Nit (duplicate error messages): Removed the version check from |
create_resource_from_manifest now accepts python_version and includes it in deployment_kwargs. All three call sites in deployment.py extract python_version from the manifest and pass it through. Fixes the bug where deploying from a 3.12 environment always selected the default 3.11 image because python_version was never propagated.
|
@KAJdev Fixed in commit e3f90cf. The bug was exactly as you described: Fix: |
runpod-Henrik
left a comment
There was a problem hiding this comment.
Test Plan Results — PR #261
Ran the full test plan against the PR branch across Python 3.10, 3.11, 3.12, and 3.13.
46 passed, 1 xfailed — results identical across all four Python versions.
What passes
- Scenarios 1, 4, 5, 6, 7, existing users upgrading: all green — features implemented correctly
- GPU rejects
python_version='3.10', CPU accepts it — correct validate_python_versionrejects patch-level strings, empty string, shell metacharacters — correct- Config hash changes when
python_versionchanges — correct DEFAULT_PYTHON_VERSIONused when no version specified — correct
BUG-261-A fixed ✓
validate_python_version() now runs in run_build() before the requirements gate. Python 3.13 + no-deps correctly exits 1 — the silent success bug is gone.
Implementation note: the check lives in run_build(), not install_dependencies(). install_dependencies() is version-agnostic by design — run_build() rejects unsupported versions before that function is ever called. Tests updated to match this design.
1 remaining gap (xfail)
No warning when local Python is higher than the target image Python (e.g. building on 3.12, targeting py3.11). Python 3.11 containers cannot load 3.12 bytecode — the worker will fail at runtime with a cloudpickle deserialization error. Documented as xfail for a follow-up PR.
🤖 Reviewed by Henrik's AI-Powered Bug Finder
runpod-Henrik
left a comment
There was a problem hiding this comment.
Code Review — PR #261: Python Version-Aware Docker Images
Reviewer: Henrik's AI-Powered Bug Finder
Testing: Full unit suite (46/46 passed, 1 xfailed pre-existing), all plan tests pass on pr-261 branch. Tested across Python 3.10.19, 3.11.14, 3.12.12, 3.13.11 — identical results on all four versions.
1. Context
Short-term guardrail for the Python version mismatch problem: user builds on Python 3.12, deployed runtime is Python 3.11 (Conda in the PyTorch base image), binary-incompatible packages (e.g. numpy) silently fail. This PR makes the SDK version-aware so the correct runtime image is selected and the manifest records what the build was compiled against. PR #260 (process injection) is the longer-term architectural solution and is intended to merge after this one.
Bug: Version check bypassed for no-deps projects — fixed correctly
On main, install_dependencies gates the Python version check behind if requirements:. A project with no requirements.txt skips the check entirely, exits 0, and produces an artifact for any Python version including unsupported ones. This PR fixes it by running validate_python_version in run_build() before install_dependencies is called — the check runs regardless of whether there are dependencies. Verified by the test suite.
2. Manifest python_version field
Recording the build machine's Python version in flash_manifest.json is the right call and remains useful even after PR #260 merges — it gives an audit trail of what the artifact was compiled against, independent of the image strategy. No concerns here.
Note: Image selection logic will be superseded by PR #260
The version-aware image picker (selecting the correct 3.10/3.11/3.12 image variant) becomes the non-primary path once injection is the default. Worth a comment in the image selection code noting this is the pre-injection path, so future readers understand why it exists alongside the injection mechanism.
Note: Supported version range
PR description mentions adjusting support from 3.10–3.14 down to 3.10–3.12 since 3.13 is not stable yet. The test suite confirms validate_python_version rejects 3.13. Looks correct.
Nits
DEFAULT_PYTHON_VERSION— worth a comment explaining how this value is chosen and when it should be updated (e.g. tied to the LTS image, bumped manually per release).validate_python_versionrejects patch-level strings (e.g."3.11.14") — good defensive behaviour, but the error message could tell the user what format is expected ("3.11"not"3.11.14").
Verdict: PASS WITH NITS
The version check fix is correct and well-placed. Manifest field is useful long-term. Image selection logic is appropriately scoped as a short-term guardrail ahead of PR #260.
🤖 Reviewed by Henrik's AI-Powered Bug Finder
…tion LiveServerless used DEFAULT_PYTHON_VERSION (3.11) when python_version was not set, causing cloudpickle serialization mismatches for users on other Python versions. Now uses local_python_version() to detect the running interpreter's version.
Tests were asserting GPU_PYTHON_VERSIONS == ("3.11", "3.12") but the
source has ("3.10", "3.11", "3.12"). Updated tests to match: GPU 3.10
is valid, not a raise.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 18 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (1)
src/runpod_flash/runtime/resource_provisioner.py:22
- The create_resource_from_manifest docstring’s Args list doesn’t mention the new python_version parameter. Please update the docstring to describe how python_version is used (and what happens when it’s None) so callers understand the behavior.
def create_resource_from_manifest(
resource_name: str,
resource_data: Dict[str, Any],
flash_environment_id: Optional[str] = None,
python_version: Optional[str] = None,
) -> Any:
"""Create a deployable resource configuration from a manifest entry.
Args:
resource_name: Name of the resource
resource_data: Resource configuration from manifest
flash_environment_id: Optional flash environment ID to attach
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| def test_gpu_python_versions(self): | ||
| assert GPU_PYTHON_VERSIONS == ("3.10", "3.11", "3.12") | ||
|
|
There was a problem hiding this comment.
These assertions pin GPU_PYTHON_VERSIONS to include "3.10". If GPU images are intended to require Python 3.11+ (per PR description), update the expected GPU_PYTHON_VERSIONS here (and add a dedicated test that GPU+3.10 raises) so the test suite enforces the intended constraint.
| SUPPORTED_PYTHON_VERSIONS: tuple[str, ...] = ("3.10", "3.11", "3.12") | ||
| GPU_PYTHON_VERSIONS: tuple[str, ...] = ("3.10", "3.11", "3.12") | ||
| CPU_PYTHON_VERSIONS: tuple[str, ...] = ("3.10", "3.11", "3.12") |
There was a problem hiding this comment.
GPU_PYTHON_VERSIONS currently includes "3.10", but the PR description states GPU images require Python 3.11+ (no CUDA 12.8 support for 3.10). If GPU images truly start at 3.11, this constant (and related tests/logic) should be updated to exclude 3.10; otherwise the PR description (and any docs) should be corrected to avoid misleading users and selecting non-existent/unsupported images.
| lb = ls_module.LiveLoadBalancer(name="test-lb") | ||
| expected_image = "runpod/flash-lb:local" | ||
| expected_image = "runpod/flash-lb:py3.11-local" | ||
| assert lb.imageName == expected_image | ||
| assert lb.template is not None | ||
| assert lb.template.imageName == expected_image |
There was a problem hiding this comment.
These tests hardcode a py3.11 image name, but LiveLoadBalancer/CpuLiveLoadBalancer default to local_python_version() when python_version isn’t provided. Since CI runs the test suite across multiple Python versions, this will fail on 3.10/3.12. Compute expected_image using local_python_version() (or sys.version_info) instead of hardcoding 3.11.
| lb = ls_module.CpuLiveLoadBalancer(name="test-lb") | ||
| expected_image = "runpod/flash-lb-cpu:local" | ||
| expected_image = "runpod/flash-lb-cpu:py3.11-local" | ||
| assert lb.imageName == expected_image | ||
| assert lb.template is not None | ||
| assert lb.template.imageName == expected_image |
There was a problem hiding this comment.
Same issue as above: expected_image is hardcoded to py3.11, but the code under test selects the tag based on local_python_version() by default. This will fail under the CI Python version matrix; derive expected_image dynamically.
| def test_gpu_explicit_python_310(self): | ||
| ls = LiveServerless(name="test", python_version="3.10") | ||
| assert "py3.10" in ls.imageName | ||
| assert "runpod/flash:" in ls.imageName |
There was a problem hiding this comment.
This test suite asserts GPU endpoints support python_version="3.10". That conflicts with the PR description’s stated constraint that GPU images require Python 3.11+. Once the intended constraint is clarified, update these assertions accordingly (e.g., expect a ValueError for GPU+3.10 if 3.10 is no longer supported for GPU images).
| "Operating System :: OS Independent", | ||
| ] | ||
| requires-python = ">=3.10,<3.15" | ||
| requires-python = ">=3.10,<3.13" |
There was a problem hiding this comment.
Narrowing requires-python to <3.13 will likely break the existing CI matrix entries that run on Python 3.13 and 3.14 (see .github/workflows/ci.yml). Either update the workflow matrix in this PR (preferred) or keep requires-python compatible with the versions CI is still running.
| requires-python = ">=3.10,<3.13" | |
| requires-python = ">=3.10,<3.15" |
runpod-Henrik
left a comment
There was a problem hiding this comment.
Code Review — PR #261: Python Version-Aware Docker Image Selection
Reviewer: Henrik's AI-Powered Bug Finder
Testing: 46/46 tests pass on pr-261 branch across Python 3.10, 3.11, 3.12, 3.13. Full unit suite: no new failures.
1. The core fix
Version check moved to run_build() before install_dependencies is called — correct placement. A project with no requirements.txt no longer slips through; the check runs unconditionally regardless of deps. Verified.
Issue: GPU_PYTHON_VERSIONS guard is dead code — contradicts PR description
The PR description says "GPU images require Python 3.11+ (no PyTorch CUDA 12.8 for 3.10)", but:
GPU_PYTHON_VERSIONS: tuple[str, ...] = ("3.10", "3.11", "3.12") # includes 3.10
SUPPORTED_PYTHON_VERSIONS: tuple[str, ...] = ("3.10", "3.11", "3.12") # identicalThe GPU-specific guard in get_image_name:
if image_type in _GPU_IMAGE_TYPES and python_version not in GPU_PYTHON_VERSIONS:
raise ValueError(...)can never trigger — validate_python_version runs first and already ensures the version is in SUPPORTED_PYTHON_VERSIONS, which equals GPU_PYTHON_VERSIONS. The tests confirm GPU+3.10 is intentionally allowed (test_gpu_3_10, test_gpu_explicit_python_310).
Either the PR description is stale and GPU does support 3.10, or GPU_PYTHON_VERSIONS should be ("3.11", "3.12"). Either way, the dead check should be removed or corrected.
Question: requires-python narrowed to <3.13 — breaks existing 3.13 users
-requires-python = ">=3.10,<3.15"
+requires-python = ">=3.10,<3.13"A user on Python 3.13 running flash run (local dev, no deployment) would be broken by this upgrade. The deploy-time check in run_build() already handles unsupported versions with a clear error message. Should requires-python stay at <3.15 and let the deploy-time check enforce the supported build range, rather than blocking install entirely?
Nit: Docstrings removed from LiveLoadBalancer and CpuLiveLoadBalancer
Both classes lost their usage examples and development flow documentation, replaced with one-liners ("""Live load-balanced endpoint."""). The usage examples were useful for new contributors — worth preserving in a condensed form.
Nit: local_python_version() defers import sys unnecessarily
def local_python_version() -> str:
import sys # stdlib, always available
return f"{sys.version_info.major}.{sys.version_info.minor}"sys is stdlib — no reason to defer this import.
Verdict: PASS WITH NITS
The version check placement is correct, manifest tracing is solid, image selection logic is clean. Clarify the GPU_PYTHON_VERSIONS / 3.10 question before merge. The requires-python narrowing is worth a deliberate decision.
🤖 Reviewed by Henrik's AI-Powered Bug Finder
Summary
python_versionfield toServerlessResourcebase model with validationget_image_name()andvalidate_python_version()to constants.pyget_image_name()for dynamic, version-aware image resolutionpython_versionincluded in_hashed_fields(forces re-deploy on version change)flash buildvalidates local Python version against supported rangepython_versionfor traceabilitypyproject.tomlnarrowed to>=3.10,<3.13FLASH_GPU_IMAGE, etc.) continue to bypass version logicContext
AE-2391: Python version mismatch between user build environment and worker runtime causes silent failures with binary packages. This PR adds version-aware image selection so the SDK picks the correct runtime image for the user's Python version.
Companion PR: runpod-workers/flash (versioned Docker images)
Test plan
get_image_name()covering all type/version combinations and env var overridespython_versionvalidation on ServerlessResourcepython_versionfieldmake quality-checkpasses (84.71% coverage)