feat: Add DisaggregatedService CRD for multi-role inference pipelines#1
feat: Add DisaggregatedService CRD for multi-role inference pipelines#1
Conversation
This adds support for disaggregated deployments of embedding models and LLMs with independent autoscaling per role. Features: - DisaggregatedService CRD with multi-role support (Router, Tokenizer, Inference, Prefill, Decode) - Per-role deployments with dedicated services - Role dependency management (dependsOn) - BudAIScaler integration via subTargetSelector.roleName - Per-role autoscaling with independent scaling policies - Conflict detection allows multiple scalers on same DS with different roles Example usage: - DisaggregatedService defines the pipeline (router -> tokenizer -> inference) - Each role gets a separate BudAIScaler targeting subTargetSelector.roleName - Roles scale independently based on their own metrics Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary of ChangesHello @dittops, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a powerful new capability for orchestrating and scaling complex AI inference pipelines on Kubernetes. By defining a Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces the DisaggregatedService CRD, a significant new feature for managing multi-role inference pipelines and enabling per-role autoscaling with BudAIScaler. The implementation is comprehensive, including the new CRD, a dedicated controller, updates to BudAIScaler logic, and thorough documentation with examples. My review focuses on correctness, consistency, and potential edge cases. I've identified a critical issue in the readiness-check logic that affects scale-to-zero scenarios, along with some inconsistencies between the implementation, validation rules, and the design documentation that should be addressed.
| if deployment.Status.ReadyReplicas == deployment.Status.Replicas && deployment.Status.Replicas > 0 { | ||
| status.Conditions = []metav1.Condition{ | ||
| { | ||
| Type: "Ready", | ||
| Status: metav1.ConditionTrue, | ||
| Reason: "AllReplicasReady", | ||
| Message: "All replicas are ready", | ||
| LastTransitionTime: metav1.Now(), | ||
| }, | ||
| } |
There was a problem hiding this comment.
The readiness check for a role is incorrect for scenarios where a role can be scaled to zero replicas. The condition deployment.Status.Replicas > 0 will always be false if a role is scaled to zero, preventing the role from ever being considered "Ready". This will block any dependent roles from starting. A role with zero desired replicas should be considered ready. The check should compare the number of ready replicas against the desired number of replicas from the deployment's spec.
| if deployment.Status.ReadyReplicas == deployment.Status.Replicas && deployment.Status.Replicas > 0 { | |
| status.Conditions = []metav1.Condition{ | |
| { | |
| Type: "Ready", | |
| Status: metav1.ConditionTrue, | |
| Reason: "AllReplicasReady", | |
| Message: "All replicas are ready", | |
| LastTransitionTime: metav1.Now(), | |
| }, | |
| } | |
| if deployment.Status.ReadyReplicas == *deployment.Spec.Replicas { | |
| status.Conditions = []metav1.Condition{ | |
| { | |
| Type: "Ready", | |
| Status: metav1.ConditionTrue, | |
| Reason: "AllReplicasReady", | |
| Message: "All replicas are ready", | |
| LastTransitionTime: metav1.Now(), | |
| }, | |
| } | |
| return false, err | ||
| } | ||
|
|
||
| return deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == deployment.Status.Replicas, nil |
There was a problem hiding this comment.
The readiness check deployment.Status.ReadyReplicas > 0 is incorrect for roles that can be scaled to zero. If a role has zero replicas, this condition will always evaluate to false, and the role will never be considered ready. This will prevent any dependent roles from starting up. The check should instead compare the number of ready replicas with the desired number of replicas from the deployment's spec to correctly handle the scale-to-zero case.
| return deployment.Status.ReadyReplicas > 0 && deployment.Status.ReadyReplicas == deployment.Status.Replicas, nil | |
| return deployment.Status.ReadyReplicas == *deployment.Spec.Replicas, nil |
| // +kubebuilder:validation:MinItems=1 | ||
| Roles []RoleSpec `json:"roles"` |
There was a problem hiding this comment.
The validation MinItems=1 for Roles seems too permissive for a "disaggregated" service. A service with a single role is not truly disaggregated. The accompanying design document (docs/design/disaggregated-service-crd.md at line 74) specifies MinItems=2, which is more aligned with the purpose of this CRD. I recommend changing this to MinItems=2 to enforce that a DisaggregatedService always has at least two roles.
| // +kubebuilder:validation:MinItems=1 | |
| Roles []RoleSpec `json:"roles"` | |
| // +kubebuilder:validation:MinItems=2 | |
| Roles []RoleSpec `json:"roles"` |
| // +genclient | ||
| // +kubebuilder:object:root=true | ||
| // +kubebuilder:subresource:status | ||
| // +kubebuilder:subresource:scale:specpath=.spec.totalReplicas,statuspath=.status.totalReplicas,selectorpath=.status.selector |
There was a problem hiding this comment.
The design document includes a scale subresource definition for the DisaggregatedService CRD. However, the implementation in api/scaler/v1alpha1/disaggregatedservice_types.go does not include this marker, and the DisaggregatedServiceSpec lacks a totalReplicas field. The implementation seems more appropriate since scaling is handled on a per-role basis via BudAIScaler and subTargetSelector. To avoid confusion, please remove the scale subresource definition from the design document to align it with the actual implementation.
Summary
DisaggregatedServiceCRD for deploying multi-role inference pipelinessubTargetSelector.roleNameFeatures
DisaggregatedService CRD
dependsOn)BudAIScaler Integration
subTargetSelector.roleNameExample Usage
Files Changed
api/scaler/v1alpha1/disaggregatedservice_types.gopkg/controller/disaggregatedservice/controller.gopkg/controller/budaiscaler/workload_scale.go,budaiscaler_controller.godocs/design/disaggregated-service-crd.mdTest plan
🤖 Generated with Claude Code