feat: Add DisaggregatedService CRD for multi-role inference pipelines by dittops · Pull Request #1 · BudEcosystem/scaler

dittops · 2026-01-09T13:36:03Z

Summary

Adds DisaggregatedService CRD for deploying multi-role inference pipelines
Enables per-role autoscaling via BudAIScaler with subTargetSelector.roleName
Supports embedding pipelines (Router → Tokenizer → Inference) and LLM P/D disaggregation

Features

DisaggregatedService CRD

Multi-role deployment with independent replicas per role
Role types: Router, Tokenizer, Inference, Prefill, Decode, Custom
Role dependency management (dependsOn)
Automatic deployment and service creation per role
Headless services for worker roles, ClusterIP for routers

BudAIScaler Integration

Target specific roles via subTargetSelector.roleName
Multiple BudAIScalers can target different roles of the same DisaggregatedService
Conflict detection updated to allow per-role scaling

Example Usage

# Deploy a disaggregated embedding pipeline
apiVersion: scaler.bud.studio/v1alpha1
kind: DisaggregatedService
metadata:
  name: embedding-pipeline
spec:
  serviceType: Embedding
  roles:
    - name: router
      roleType: Router
      replicas: 1
    - name: tokenizer
      roleType: Tokenizer
      replicas: 2
    - name: inference
      roleType: Inference
      replicas: 3
      dependsOn: [tokenizer]

# Scale a specific role independently
apiVersion: scaler.bud.studio/v1alpha1
kind: BudAIScaler
metadata:
  name: inference-scaler
spec:
  scaleTargetRef:
    apiVersion: scaler.bud.studio/v1alpha1
    kind: DisaggregatedService
    name: embedding-pipeline
  subTargetSelector:
    roleName: inference
  minReplicas: 1
  maxReplicas: 10

Files Changed

New CRD: api/scaler/v1alpha1/disaggregatedservice_types.go
New Controller: pkg/controller/disaggregatedservice/controller.go
BudAIScaler Updates: pkg/controller/budaiscaler/workload_scale.go, budaiscaler_controller.go
Samples: 4 new sample configurations
Design Doc: docs/design/disaggregated-service-crd.md
README: Updated with feature documentation

Test plan

DisaggregatedService creates deployments for each role
DisaggregatedService creates services (ClusterIP for router, headless for workers)
BudAIScaler can read scale from DisaggregatedService roles
BudAIScaler can update replicas for specific roles
Multiple BudAIScalers can target different roles without conflict
E2E tests for disaggregated scaling

🤖 Generated with Claude Code

This adds support for disaggregated deployments of embedding models and LLMs with independent autoscaling per role. Features: - DisaggregatedService CRD with multi-role support (Router, Tokenizer, Inference, Prefill, Decode) - Per-role deployments with dedicated services - Role dependency management (dependsOn) - BudAIScaler integration via subTargetSelector.roleName - Per-role autoscaling with independent scaling policies - Conflict detection allows multiple scalers on same DS with different roles Example usage: - DisaggregatedService defines the pipeline (router -> tokenizer -> inference) - Each role gets a separate BudAIScaler targeting subTargetSelector.roleName - Roles scale independently based on their own metrics Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gemini-code-assist · 2026-01-09T13:36:29Z

Summary of Changes

Hello @dittops, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a powerful new capability for orchestrating and scaling complex AI inference pipelines on Kubernetes. By defining a DisaggregatedService CRD, users can now declaratively manage multi-component workloads, where each component (or 'role') can have its own resource requirements, scaling policies, and dependencies. This significantly simplifies the deployment and operational overhead for advanced AI models like embedding pipelines and large language models, allowing for more efficient resource utilization and tailored scaling strategies across heterogeneous inference stages.

Highlights

New DisaggregatedService CRD: Introduced a new Custom Resource Definition (CRD) named DisaggregatedService to manage multi-role inference pipelines, allowing for deployment of complex AI workloads with distinct components (roles).
Per-Role Autoscaling Integration: Enabled BudAIScaler to target and autoscale individual roles within a DisaggregatedService using the subTargetSelector.roleName field, providing granular control over scaling behavior for different pipeline stages.
Automated Role Management: A new controller for DisaggregatedService automates the creation and management of Kubernetes Deployments and Services for each defined role, including handling role dependencies and service types (e.g., ClusterIP, Headless).
Enhanced BudAIScaler Conflict Detection: Updated the BudAIScaler's conflict detection logic to correctly allow multiple BudAIScalers to target different roles of the same DisaggregatedService without false positives.
Comprehensive Documentation and Examples: Added a detailed design document for the DisaggregatedService CRD, updated the main README.md with usage examples, and provided new sample configurations for embedding pipelines, LLM prefill/decode, and per-role autoscaling.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces the DisaggregatedService CRD, a significant new feature for managing multi-role inference pipelines and enabling per-role autoscaling with BudAIScaler. The implementation is comprehensive, including the new CRD, a dedicated controller, updates to BudAIScaler logic, and thorough documentation with examples. My review focuses on correctness, consistency, and potential edge cases. I've identified a critical issue in the readiness-check logic that affects scale-to-zero scenarios, along with some inconsistencies between the implementation, validation rules, and the design documentation that should be addressed.

gemini-code-assist · 2026-01-09T13:38:55Z

pkg/controller/disaggregatedservice/controller.go

+	if deployment.Status.ReadyReplicas == deployment.Status.Replicas && deployment.Status.Replicas > 0 {
+		status.Conditions = []metav1.Condition{
+			{
+				Type:               "Ready",
+				Status:             metav1.ConditionTrue,
+				Reason:             "AllReplicasReady",
+				Message:            "All replicas are ready",
+				LastTransitionTime: metav1.Now(),
+			},
+		}


The readiness check for a role is incorrect for scenarios where a role can be scaled to zero replicas. The condition deployment.Status.Replicas > 0 will always be false if a role is scaled to zero, preventing the role from ever being considered "Ready". This will block any dependent roles from starting. A role with zero desired replicas should be considered ready. The check should compare the number of ready replicas against the desired number of replicas from the deployment's spec.

Suggested change

if deployment.Status.ReadyReplicas == deployment.Status.Replicas && deployment.Status.Replicas > 0 {

status.Conditions = []metav1.Condition{

{

Type: "Ready",

Status: metav1.ConditionTrue,

Reason: "AllReplicasReady",

Message: "All replicas are ready",

LastTransitionTime: metav1.Now(),

},

}

if deployment.Status.ReadyReplicas == *deployment.Spec.Replicas {

status.Conditions = []metav1.Condition{

{

Type: "Ready",

Status: metav1.ConditionTrue,

Reason: "AllReplicasReady",

Message: "All replicas are ready",

LastTransitionTime: metav1.Now(),

},

}

gemini-code-assist · 2026-01-09T13:38:55Z

pkg/controller/disaggregatedservice/controller.go

+		return false, err
+	}
+
+	return deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == deployment.Status.Replicas, nil


The readiness check deployment.Status.ReadyReplicas > 0 is incorrect for roles that can be scaled to zero. If a role has zero replicas, this condition will always evaluate to false, and the role will never be considered ready. This will prevent any dependent roles from starting up. The check should instead compare the number of ready replicas with the desired number of replicas from the deployment's spec to correctly handle the scale-to-zero case.

Suggested change

return deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == deployment.Status.Replicas, nil

return deployment.Status.ReadyReplicas == *deployment.Spec.Replicas, nil

gemini-code-assist · 2026-01-09T13:38:55Z

api/scaler/v1alpha1/disaggregatedservice_types.go

+	// +kubebuilder:validation:MinItems=1
+	Roles []RoleSpec `json:"roles"`


The validation MinItems=1 for Roles seems too permissive for a "disaggregated" service. A service with a single role is not truly disaggregated. The accompanying design document (docs/design/disaggregated-service-crd.md at line 74) specifies MinItems=2, which is more aligned with the purpose of this CRD. I recommend changing this to MinItems=2 to enforce that a DisaggregatedService always has at least two roles.

Suggested change

// +kubebuilder:validation:MinItems=1

Roles []RoleSpec `json:"roles"`

// +kubebuilder:validation:MinItems=2

Roles []RoleSpec `json:"roles"`

gemini-code-assist · 2026-01-09T13:38:55Z

docs/design/disaggregated-service-crd.md

+// +genclient
+// +kubebuilder:object:root=true
+// +kubebuilder:subresource:status
+// +kubebuilder:subresource:scale:specpath=.spec.totalReplicas,statuspath=.status.totalReplicas,selectorpath=.status.selector


The design document includes a scale subresource definition for the DisaggregatedService CRD. However, the implementation in api/scaler/v1alpha1/disaggregatedservice_types.go does not include this marker, and the DisaggregatedServiceSpec lacks a totalReplicas field. The implementation seems more appropriate since scaling is handled on a per-role basis via BudAIScaler and subTargetSelector. To avoid confusion, please remove the scale subresource definition from the design document to align it with the actual implementation.

gemini-code-assist bot reviewed Jan 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add DisaggregatedService CRD for multi-role inference pipelines#1

feat: Add DisaggregatedService CRD for multi-role inference pipelines#1
dittops wants to merge 1 commit intomainfrom
feat/disaggregated-service

dittops commented Jan 9, 2026

Uh oh!

gemini-code-assist bot commented Jan 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Uh oh!

gemini-code-assist bot Jan 9, 2026

Uh oh!

gemini-code-assist bot Jan 9, 2026

Uh oh!

gemini-code-assist bot Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	return deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == deployment.Status.Replicas, nil
	return deployment.Status.ReadyReplicas == *deployment.Spec.Replicas, nil

		// +kubebuilder:validation:MinItems=1
		Roles []RoleSpec `json:"roles"`

Conversation

dittops commented Jan 9, 2026

Summary

Features

DisaggregatedService CRD

BudAIScaler Integration

Example Usage

Files Changed

Test plan

Uh oh!

gemini-code-assist bot commented Jan 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant