Quick Start

RoleBasedGroup (RBG) is a custom resource that models a group of roles (each role represents a workload type and set of pods) and the relationships between them. It is intended to manage multi-role applications that may require coordinated scheduling, lifecycle management, rolling updates, and optional gang-scheduling (PodGroup) support.

Conceptual View

Key Feature

PD Colocation

When a request comes into an LLM inference engine, the system will first take the user input to generate the first token (prefill), then generate outputs token-by-token autoregressively (decode). A request usually consists of one prefill step, and multiple decoding steps until termination.

Single Node

When the model is small enough that a single Kubernetes Node can load all model files, you can deploy the LLM inference service on a single node.

Examples

Multi Nodes

When the model is too large for a single Node to load all files, use multi-node distributed inference.

Examples

PD Disaggregated

Colocating the two phases and batch the computation of prefill and decoding across all users and requests not only leads to strong prefill-decoding interferences but also couples the resource allocation and parallelism plans for both phases. Disaggregating the prefill and decoding computation improves the performance of large language models(LLMs) serving.

Examples

Deploying PD-disagg inference service with RBG.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Start

Conceptual View

Key Feature

PD Colocation

Single Node

Examples

Multi Nodes

Examples

PD Disaggregated

Examples

FilesExpand file tree

quick_start.md

Latest commit

History

quick_start.md

File metadata and controls

Quick Start

Conceptual View

Key Feature

PD Colocation

Single Node

Examples

Multi Nodes

Examples

PD Disaggregated

Examples