Skip to content

feat!: update default CUDA image to server-cuda13 for Qwen3.5 and Blackwell support#262

Open
Defilan wants to merge 1 commit intomainfrom
fix/update-cuda-image
Open

feat!: update default CUDA image to server-cuda13 for Qwen3.5 and Blackwell support#262
Defilan wants to merge 1 commit intomainfrom
fix/update-cuda-image

Conversation

@Defilan
Copy link
Copy Markdown
Member

@Defilan Defilan commented Apr 3, 2026

Summary

Updates the default GPU inference image from server-cuda (CUDA 12) to server-cuda13 (CUDA 13) across the codebase. This is a one-line fix that enables support for Qwen3.5 models and native Blackwell GPU acceleration.

What this fixes:

  • Qwen3.5 models crash with unknown model architecture: 'qwen35' on the old image
  • RTX 50-series GPUs (5060 Ti, 5070, 5080, 5090) run via PTX JIT fallback instead of native SM 120

What changes:

  • CLI auto-detects server-cuda13 instead of server-cuda for GPU deployments
  • Benchmark tool uses server-cuda13
  • All sample YAML manifests updated
  • Air-gapped deployment docs updated
  • CLI --image help text documents how to override with an older image

Driver requirement: NVIDIA driver 590+ (CUDA 13.1). Users on older drivers can still use --image ghcr.io/ggml-org/llama.cpp:server-cuda.

Validated on: Shadowstack (RTX 5060 Ti, driver 590.48.01) with Qwen3.5-9B at 61.5 tok/s generation.

Test plan

  • make test passes
  • Qwen3.5-9B deploys and serves inference with server-cuda13
  • Existing Qwen2.5 models still work
  • --image ghcr.io/ggml-org/llama.cpp:server-cuda override works for older drivers

Fixes #261

…ckwell support

BREAKING CHANGE: The default GPU inference image is now server-cuda13
(CUDA 13) instead of server-cuda (CUDA 12). This requires NVIDIA
driver 590+ (CUDA 13.1). Users on older drivers must specify
--image ghcr.io/ggml-org/llama.cpp:server-cuda to use the CUDA 12 image.

This enables:
- Qwen3.5 model architecture support (previously unknown architecture)
- Native SM 120 (Blackwell) GPU support for RTX 50-series
- Latest llama.cpp optimizations and model architecture support

Updated across CLI, benchmark tool, sample manifests, and docs.

Fixes #261

Signed-off-by: Christopher Maher <chris@mahercode.io>
@Defilan Defilan force-pushed the fix/update-cuda-image branch from 54ed14b to 2fab52e Compare April 3, 2026 02:42
@Defilan Defilan changed the title fix: update default CUDA image to server-cuda13 feat!: update default CUDA image to server-cuda13 for Qwen3.5 and Blackwell support Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: stock llama.cpp server-cuda image doesn't support Qwen3.5 architecture

1 participant