vLLM Backend - ROCm Edition

Architecture

The vLLM backend is a pure Python backend — it contains no C/C++ or HIP/CUDA code. It acts as a thin wrapper that bridges Triton's Python backend interface with the vLLM inference engine. All heavy lifting (inference, paged attention, continuous batching) is performed by the vLLM engine.

Repository contents

File	Purpose
`src/model.py`	`TritonPythonModel` class — the entry point that Triton loads. Receives requests, forwards them to the vLLM `AsyncEngine`, and streams responses back.
`src/utils/metrics.py`	vLLM statistics and metrics integration with Triton.
`src/utils/request.py`	Request handling utilities for generate and embed operations.

How the backend is built and deployed

There is no CMake build or compilation step. The build process (driven by the Triton server's build.py) is:

Git clone this repository.
Copy src/model.py and src/utils/ into /opt/tritonserver/backends/vllm/.
Install the vLLM engine separately:

The Python model.py itself is hardware-agnostic — it calls vLLM's Python API (AsyncEngineArgs, build_async_engine_client_from_engine_args), and vLLM internally handles whether it is running on CUDA or ROCm.

ROCm enablement

Since this backend is pure Python, ROCm support does not require hipification or any C/C++ changes in this repository. The ROCm enablement happens in two places outside this repo:

vLLM engine — vLLM has its own ROCm support.
Triton server — the server's own C++ code (shared memory manager, gRPC/HTTP endpoints, etc.) has #ifdef TRITON_ENABLE_ROCM guards that swap CUDA API calls for HIP equivalents. Those changes live in the server repository, not here.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.github/workflows		.github/workflows
ci		ci
docs		docs
samples		samples
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vLLM Backend - ROCm Edition

Architecture

Repository contents

How the backend is built and deployed

ROCm enablement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Languages

Folders and files

Latest commit

History

Repository files navigation

vLLM Backend - ROCm Edition

Architecture

Repository contents

How the backend is built and deployed

ROCm enablement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 0

Languages

Packages

Contributors