Skip to content

feat: Add DisaggregatedService CRD for multi-role inference pipelines#1

Open
dittops wants to merge 1 commit intomainfrom
feat/disaggregated-service
Open

feat: Add DisaggregatedService CRD for multi-role inference pipelines#1
dittops wants to merge 1 commit intomainfrom
feat/disaggregated-service

Conversation

@dittops
Copy link
Member

@dittops dittops commented Jan 9, 2026

Summary

  • Adds DisaggregatedService CRD for deploying multi-role inference pipelines
  • Enables per-role autoscaling via BudAIScaler with subTargetSelector.roleName
  • Supports embedding pipelines (Router → Tokenizer → Inference) and LLM P/D disaggregation

Features

DisaggregatedService CRD

  • Multi-role deployment with independent replicas per role
  • Role types: Router, Tokenizer, Inference, Prefill, Decode, Custom
  • Role dependency management (dependsOn)
  • Automatic deployment and service creation per role
  • Headless services for worker roles, ClusterIP for routers

BudAIScaler Integration

  • Target specific roles via subTargetSelector.roleName
  • Multiple BudAIScalers can target different roles of the same DisaggregatedService
  • Conflict detection updated to allow per-role scaling

Example Usage

# Deploy a disaggregated embedding pipeline
apiVersion: scaler.bud.studio/v1alpha1
kind: DisaggregatedService
metadata:
  name: embedding-pipeline
spec:
  serviceType: Embedding
  roles:
    - name: router
      roleType: Router
      replicas: 1
    - name: tokenizer
      roleType: Tokenizer
      replicas: 2
    - name: inference
      roleType: Inference
      replicas: 3
      dependsOn: [tokenizer]
# Scale a specific role independently
apiVersion: scaler.bud.studio/v1alpha1
kind: BudAIScaler
metadata:
  name: inference-scaler
spec:
  scaleTargetRef:
    apiVersion: scaler.bud.studio/v1alpha1
    kind: DisaggregatedService
    name: embedding-pipeline
  subTargetSelector:
    roleName: inference
  minReplicas: 1
  maxReplicas: 10

Files Changed

  • New CRD: api/scaler/v1alpha1/disaggregatedservice_types.go
  • New Controller: pkg/controller/disaggregatedservice/controller.go
  • BudAIScaler Updates: pkg/controller/budaiscaler/workload_scale.go, budaiscaler_controller.go
  • Samples: 4 new sample configurations
  • Design Doc: docs/design/disaggregated-service-crd.md
  • README: Updated with feature documentation

Test plan

  • DisaggregatedService creates deployments for each role
  • DisaggregatedService creates services (ClusterIP for router, headless for workers)
  • BudAIScaler can read scale from DisaggregatedService roles
  • BudAIScaler can update replicas for specific roles
  • Multiple BudAIScalers can target different roles without conflict
  • E2E tests for disaggregated scaling

🤖 Generated with Claude Code

This adds support for disaggregated deployments of embedding models and LLMs
with independent autoscaling per role.

Features:
- DisaggregatedService CRD with multi-role support (Router, Tokenizer, Inference, Prefill, Decode)
- Per-role deployments with dedicated services
- Role dependency management (dependsOn)
- BudAIScaler integration via subTargetSelector.roleName
- Per-role autoscaling with independent scaling policies
- Conflict detection allows multiple scalers on same DS with different roles

Example usage:
- DisaggregatedService defines the pipeline (router -> tokenizer -> inference)
- Each role gets a separate BudAIScaler targeting subTargetSelector.roleName
- Roles scale independently based on their own metrics

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@gemini-code-assist
Copy link

Summary of Changes

Hello @dittops, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a powerful new capability for orchestrating and scaling complex AI inference pipelines on Kubernetes. By defining a DisaggregatedService CRD, users can now declaratively manage multi-component workloads, where each component (or 'role') can have its own resource requirements, scaling policies, and dependencies. This significantly simplifies the deployment and operational overhead for advanced AI models like embedding pipelines and large language models, allowing for more efficient resource utilization and tailored scaling strategies across heterogeneous inference stages.

Highlights

  • New DisaggregatedService CRD: Introduced a new Custom Resource Definition (CRD) named DisaggregatedService to manage multi-role inference pipelines, allowing for deployment of complex AI workloads with distinct components (roles).
  • Per-Role Autoscaling Integration: Enabled BudAIScaler to target and autoscale individual roles within a DisaggregatedService using the subTargetSelector.roleName field, providing granular control over scaling behavior for different pipeline stages.
  • Automated Role Management: A new controller for DisaggregatedService automates the creation and management of Kubernetes Deployments and Services for each defined role, including handling role dependencies and service types (e.g., ClusterIP, Headless).
  • Enhanced BudAIScaler Conflict Detection: Updated the BudAIScaler's conflict detection logic to correctly allow multiple BudAIScalers to target different roles of the same DisaggregatedService without false positives.
  • Comprehensive Documentation and Examples: Added a detailed design document for the DisaggregatedService CRD, updated the main README.md with usage examples, and provided new sample configurations for embedding pipelines, LLM prefill/decode, and per-role autoscaling.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the DisaggregatedService CRD, a significant new feature for managing multi-role inference pipelines and enabling per-role autoscaling with BudAIScaler. The implementation is comprehensive, including the new CRD, a dedicated controller, updates to BudAIScaler logic, and thorough documentation with examples. My review focuses on correctness, consistency, and potential edge cases. I've identified a critical issue in the readiness-check logic that affects scale-to-zero scenarios, along with some inconsistencies between the implementation, validation rules, and the design documentation that should be addressed.

Comment on lines +216 to +225
if deployment.Status.ReadyReplicas == deployment.Status.Replicas && deployment.Status.Replicas > 0 {
status.Conditions = []metav1.Condition{
{
Type: "Ready",
Status: metav1.ConditionTrue,
Reason: "AllReplicasReady",
Message: "All replicas are ready",
LastTransitionTime: metav1.Now(),
},
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The readiness check for a role is incorrect for scenarios where a role can be scaled to zero replicas. The condition deployment.Status.Replicas > 0 will always be false if a role is scaled to zero, preventing the role from ever being considered "Ready". This will block any dependent roles from starting. A role with zero desired replicas should be considered ready. The check should compare the number of ready replicas against the desired number of replicas from the deployment's spec.

Suggested change
if deployment.Status.ReadyReplicas == deployment.Status.Replicas && deployment.Status.Replicas > 0 {
status.Conditions = []metav1.Condition{
{
Type: "Ready",
Status: metav1.ConditionTrue,
Reason: "AllReplicasReady",
Message: "All replicas are ready",
LastTransitionTime: metav1.Now(),
},
}
if deployment.Status.ReadyReplicas == *deployment.Spec.Replicas {
status.Conditions = []metav1.Condition{
{
Type: "Ready",
Status: metav1.ConditionTrue,
Reason: "AllReplicasReady",
Message: "All replicas are ready",
LastTransitionTime: metav1.Now(),
},
}

return false, err
}

return deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == deployment.Status.Replicas, nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The readiness check deployment.Status.ReadyReplicas > 0 is incorrect for roles that can be scaled to zero. If a role has zero replicas, this condition will always evaluate to false, and the role will never be considered ready. This will prevent any dependent roles from starting up. The check should instead compare the number of ready replicas with the desired number of replicas from the deployment's spec to correctly handle the scale-to-zero case.

Suggested change
return deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == deployment.Status.Replicas, nil
return deployment.Status.ReadyReplicas == *deployment.Spec.Replicas, nil

Comment on lines +59 to +60
// +kubebuilder:validation:MinItems=1
Roles []RoleSpec `json:"roles"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The validation MinItems=1 for Roles seems too permissive for a "disaggregated" service. A service with a single role is not truly disaggregated. The accompanying design document (docs/design/disaggregated-service-crd.md at line 74) specifies MinItems=2, which is more aligned with the purpose of this CRD. I recommend changing this to MinItems=2 to enforce that a DisaggregatedService always has at least two roles.

Suggested change
// +kubebuilder:validation:MinItems=1
Roles []RoleSpec `json:"roles"`
// +kubebuilder:validation:MinItems=2
Roles []RoleSpec `json:"roles"`

// +genclient
// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:subresource:scale:specpath=.spec.totalReplicas,statuspath=.status.totalReplicas,selectorpath=.status.selector

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The design document includes a scale subresource definition for the DisaggregatedService CRD. However, the implementation in api/scaler/v1alpha1/disaggregatedservice_types.go does not include this marker, and the DisaggregatedServiceSpec lacks a totalReplicas field. The implementation seems more appropriate since scaling is handled on a per-role basis via BudAIScaler and subTargetSelector. To avoid confusion, please remove the scale subresource definition from the design document to align it with the actual implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant