Skip to content

feat(scheduler): Decoupled Scheduler with Router and Adding Router Readiness Check#354

Merged
christ-tt merged 18 commits intoGradientHQ:mainfrom
christ-tt:pipeline_refactored
Feb 9, 2026
Merged

feat(scheduler): Decoupled Scheduler with Router and Adding Router Readiness Check#354
christ-tt merged 18 commits intoGradientHQ:mainfrom
christ-tt:pipeline_refactored

Conversation

@christ-tt
Copy link
Collaborator

@christ-tt christ-tt commented Dec 25, 2025

  • Add a check that there's at least one ready pipeline, defined by all nodes in it are is_active, before dispatching requests.
  • Decoupled router and scheduler; so we don't conditioning on strategy; and we let router handles the routing related tasks.

TODO

  • Maybe rename node.is_active , as the active here means even after we have assigned layers / pipeline formation, node will need time for parameters loading, engine setup, cuda graph init, etc. The naming here may causing confusion for how we define active and standby for our nodes.

Current display info:

Feb 09 05:37:58.041 [scheduling] [DEBUG   ] scheduler.py:406          Allocation snapshot
Registered pipelines (2)
------------------------
Capacity: total=16 current capacity=0 per_pipeline={0: (8, 0), 1: (8, 0)}
  pipeline 0   | stages=2 | ready=False | cap=8 cur=8
    [00] 12D3KooWBcWf7v4kK9VT2dYh7N8mFvTo67BYeaeoAujxDJKyx8hd layers [  0,  18) | load   0/8   | latency    0.25 ms | active False
    [01] 12D3KooWC1nAdPXTsfQtYwNn28Vv61umqvdo8Edyzk4kUjnBnrYJ layers [ 18,  36) | load   0/8   | latency    0.25 ms | active False
  pipeline 1   | stages=2 | ready=False | cap=8 cur=8
    [00] 12D3KooWJmEVbSNJUF5U4jkan4BRXKT52f5sKzaNCwGer8UiCDJp layers [  0,  20) | load   0/8   | latency    0.25 ms | active False
    [01] 12D3KooWA4uVmdKSgDN6pmTxKTtaozqQZswYBUrjDoyP3LUkV2Bg layers [ 20,  36) | load   0/8   | latency    0.46 ms | active False

Standby nodes (0)
-----------------
  (none)

@christ-tt christ-tt changed the title feat(scheduler): Router Ready Check feat(scheduler): Decoupled Scheduler with Router and Adding Router Readiness Check Dec 25, 2025
@christ-tt christ-tt marked this pull request as ready for review December 25, 2025 02:24
@christ-tt christ-tt requested review from a team and gufengc December 25, 2025 02:24
@christ-tt christ-tt enabled auto-merge (squash) February 9, 2026 05:40
@christ-tt christ-tt merged commit 0bad018 into GradientHQ:main Feb 9, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants