feat(docker): add CUDA 12.8 support for RTX 50-series GPUs by sh3ll3x3c · Pull Request #1856 · roboflow/inference

sh3ll3x3c · 2025-12-28T08:29:53Z

Description

Add CUDA 12.8 support for NVIDIA RTX 50-series (Blackwell/sm_120) GPUs.

The current Dockerfile.onnx.gpu uses CUDA 12.4 which doesn't support the new RTX 50-series architecture (sm_120). This PR adds a new Dockerfile that enables GPU inference on RTX 5090, 5080, 5070 Ti, and 5070 cards.

Key changes:

CUDA 12.8.1 base image (required for sm_120 architecture)
PyTorch nightly with cu128 support (stable PyTorch doesn't support sm_120 yet)
onnxruntime-gpu from Microsoft's CUDA 12 index (default PyPI package lacks CUDAExecutionProvider for CUDA 12)
flash_attn build skipped by default (optional, significantly reduces build time)

Related issue: Users with RTX 50-series GPUs cannot use GPU acceleration with the current Docker images.

Type of change

New feature (non-breaking change which adds functionality)

How has this change been tested, please provide a testcase or example of how you tested the change?

Tested on NVIDIA GeForce RTX 5090:

Built the image:

docker build -f docker/dockerfiles/Dockerfile.onnx.gpu.cuda128 \
  -t roboflow/roboflow-inference-server-gpu-cuda128 .

Verified CUDA provider is available:

docker exec <container> python -c "import onnxruntime as ort; print(ort.get_available_providers())"
# Output: ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']

Tested inference speed (1080p image, object detection):
- First request: ~54s (model download + load)
- Subsequent requests: 65-100ms (GPU inference)
Verified GPU memory allocation via nvidia-smi

Any specific deployment considerations

This Dockerfile uses PyTorch nightly builds (cu128) since stable PyTorch doesn't yet support sm_120
Once PyTorch stable releases with CUDA 12.8/sm_120 support, the Dockerfile can be updated to use stable builds
Users who need Paligemma/Florence2 support can uncomment the flash_attn build step (adds significant build time)

Docs

Docs updated? What were the changes:

Documentation update suggested: Add a note in the Docker deployment docs mentioning Dockerfile.onnx.gpu.cuda128 for RTX 50-series GPU users.

Add new Dockerfile.onnx.gpu.cuda128 to enable GPU inference on NVIDIA RTX 50-series (Blackwell/sm_120) GPUs including RTX 5090, 5080, 5070 Ti, and 5070. Key changes: - Use CUDA 12.8.1 base image (required for sm_120 architecture) - Install PyTorch nightly with cu128 support - Install onnxruntime-gpu from Microsoft's CUDA 12 index to enable CUDAExecutionProvider (default PyPI package lacks CUDA 12 support) - Skip flash_attn build by default (optional, reduces build time) Build: docker build -f docker/dockerfiles/Dockerfile.onnx.gpu.cuda128 \ -t roboflow/roboflow-inference-server-gpu-cuda128 . Run: docker run --gpus all -p 9001:9001 \ roboflow/roboflow-inference-server-gpu-cuda128 Tested on RTX 5090 with ~65-100ms inference times. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

CLAassistant · 2025-12-28T08:29:58Z

All committers have signed the CLA.

sh3ll3x3c requested review from PawelPeczek-Roboflow, grzegorz-roboflow, hansent, probicheaux and yeldarby as code owners December 28, 2025 08:29

sh3ll3x3c added 3 commits December 28, 2025 09:31

Merge branch 'main' into feat/cu-128-support

8e7cd51

Merge branch 'main' into feat/cu-128-support

99fdb4f

Merge branch 'main' into feat/cu-128-support

038b3e1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(docker): add CUDA 12.8 support for RTX 50-series GPUs#1856

feat(docker): add CUDA 12.8 support for RTX 50-series GPUs#1856
sh3ll3x3c wants to merge 4 commits intoroboflow:mainfrom
sh3ll3x3c:feat/cu-128-support

sh3ll3x3c commented Dec 28, 2025

Uh oh!

CLAassistant commented Dec 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sh3ll3x3c commented Dec 28, 2025

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

Uh oh!

CLAassistant commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Dec 28, 2025 •

edited

Loading