Skip to content

ROCm/triton-inference-server-vllm_backend

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License

vLLM Backend - ROCm Edition

Architecture

The vLLM backend is a pure Python backend — it contains no C/C++ or HIP/CUDA code. It acts as a thin wrapper that bridges Triton's Python backend interface with the vLLM inference engine. All heavy lifting (inference, paged attention, continuous batching) is performed by the vLLM engine.

Repository contents

File Purpose
src/model.py TritonPythonModel class — the entry point that Triton loads. Receives requests, forwards them to the vLLM AsyncEngine, and streams responses back.
src/utils/metrics.py vLLM statistics and metrics integration with Triton.
src/utils/request.py Request handling utilities for generate and embed operations.

How the backend is built and deployed

There is no CMake build or compilation step. The build process (driven by the Triton server's build.py) is:

  1. Git clone this repository.
  2. Copy src/model.py and src/utils/ into /opt/tritonserver/backends/vllm/.
  3. Install the vLLM engine separately:

The Python model.py itself is hardware-agnostic — it calls vLLM's Python API (AsyncEngineArgs, build_async_engine_client_from_engine_args), and vLLM internally handles whether it is running on CUDA or ROCm.

ROCm enablement

Since this backend is pure Python, ROCm support does not require hipification or any C/C++ changes in this repository. The ROCm enablement happens in two places outside this repo:

  1. vLLM engine — vLLM has its own ROCm support.
  2. Triton server — the server's own C++ code (shared memory manager, gRPC/HTTP endpoints, etc.) has #ifdef TRITON_ENABLE_ROCM guards that swap CUDA API calls for HIP equivalents. Those changes live in the server repository, not here.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

No contributors

Languages

  • Python 71.4%
  • Shell 28.6%