feat!: update default CUDA image to server-cuda13 for Qwen3.5 and Blackwell support by Defilan · Pull Request #262 · defilantech/LLMKube

Defilan · 2026-04-03T02:20:08Z

Summary

Updates the default GPU inference image from server-cuda (CUDA 12) to server-cuda13 (CUDA 13) across the codebase. This is a one-line fix that enables support for Qwen3.5 models and native Blackwell GPU acceleration.

What this fixes:

Qwen3.5 models crash with unknown model architecture: 'qwen35' on the old image
RTX 50-series GPUs (5060 Ti, 5070, 5080, 5090) run via PTX JIT fallback instead of native SM 120

What changes:

CLI auto-detects server-cuda13 instead of server-cuda for GPU deployments
Benchmark tool uses server-cuda13
All sample YAML manifests updated
Air-gapped deployment docs updated
CLI --image help text documents how to override with an older image

Driver requirement: NVIDIA driver 590+ (CUDA 13.1). Users on older drivers can still use --image ghcr.io/ggml-org/llama.cpp:server-cuda.

Validated on: Shadowstack (RTX 5060 Ti, driver 590.48.01) with Qwen3.5-9B at 61.5 tok/s generation.

Test plan

make test passes
Qwen3.5-9B deploys and serves inference with server-cuda13
Existing Qwen2.5 models still work
--image ghcr.io/ggml-org/llama.cpp:server-cuda override works for older drivers

Fixes #261

…ckwell support BREAKING CHANGE: The default GPU inference image is now server-cuda13 (CUDA 13) instead of server-cuda (CUDA 12). This requires NVIDIA driver 590+ (CUDA 13.1). Users on older drivers must specify --image ghcr.io/ggml-org/llama.cpp:server-cuda to use the CUDA 12 image. This enables: - Qwen3.5 model architecture support (previously unknown architecture) - Native SM 120 (Blackwell) GPU support for RTX 50-series - Latest llama.cpp optimizations and model architecture support Updated across CLI, benchmark tool, sample manifests, and docs. Fixes #261 Signed-off-by: Christopher Maher <chris@mahercode.io>

Defilan force-pushed the fix/update-cuda-image branch from 54ed14b to 2fab52e Compare April 3, 2026 02:42

Defilan changed the title ~~fix: update default CUDA image to server-cuda13~~ feat!: update default CUDA image to server-cuda13 for Qwen3.5 and Blackwell support Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: update default CUDA image to server-cuda13 for Qwen3.5 and Blackwell support#262

feat!: update default CUDA image to server-cuda13 for Qwen3.5 and Blackwell support#262
Defilan wants to merge 1 commit intomainfrom
fix/update-cuda-image

Defilan commented Apr 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Defilan commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Defilan commented Apr 3, 2026 •

edited

Loading