Skip to content

feat: port dynamic load-based idle process scaling from Python SDK#1512

Draft
KrishnaShuk wants to merge 2 commits into
livekit:mainfrom
KrishnaShuk:feat/dynamic-idle-process-scaling
Draft

feat: port dynamic load-based idle process scaling from Python SDK#1512
KrishnaShuk wants to merge 2 commits into
livekit:mainfrom
KrishnaShuk:feat/dynamic-idle-process-scaling

Conversation

@KrishnaShuk
Copy link
Copy Markdown
Contributor

Description

ref: #1449
This PR ports the dynamic idle process scaling logic from the Python SDK (livekit-agents) into the JS SDK, ensuring the worker dynamically scales down idle processes to prevent over-allocation on large multicore systems under heavy load.

Pre-Review Checklist

  • Build passes: All builds (lint, typecheck, tests) pass locally
  • AI-generated code reviewed: Removed unnecessary comments and ensured code quality
  • Changes explained: All changes are properly documented and justified above
  • Scope appropriate: All changes relate to the PR title, or explanations provided for why they're included
  • Video demo: A small video demo showing changes works as expected and did not break any existing functionality using Agent Playground (if applicable)

Testing

  • Automated tests added/updated (if applicable)
  • All tests pass
  • Make sure both restaurant_agent.ts and realtime_agent.ts work properly (for major changes)

Additional Notes


Note to reviewers: Please ensure the pre-review checklist is completed before starting your review.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 15, 2026

🦋 Changeset detected

Latest commit: 89d525f

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 31 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-fishaudio Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-hume Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 4 additional findings in Devin Review.

Open in Devin Review

Comment thread agents/src/worker.ts
Comment on lines +686 to +687
if (isFull) {
this.#procPool.setTargetIdleProcesses(this.#opts.numIdleProcesses);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 When worker is at full load, target idle processes is set to maximum instead of zero

In worker.ts:686-687, when isFull is true (worker load >= threshold), the code sets targetIdleProcesses to this.#opts.numIdleProcesses (the configured maximum). This is inverted — when the system is overloaded, idle processes should be reduced to 0, not maximized. The else branch correctly computes a lower target as load increases (proportional to remaining capacity), but the isFull branch jumps to max, defeating the purpose of load-based scaling and causing the pool to spawn/maintain idle processes even when the worker is at full capacity, wasting resources and potentially worsening the overload.

Suggested change
if (isFull) {
this.#procPool.setTargetIdleProcesses(this.#opts.numIdleProcesses);
if (isFull) {
this.#procPool.setTargetIdleProcesses(0);
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

}
});
while (!signal.aborted) {
const currentPending = this.warmedProcQueue.items.length + this.spawnTasks.size;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 spawnTasks overcounts by including tasks for processes already running jobs, preventing idle pool replenishment

In proc_pool.ts:142, currentPending is computed as warmedProcQueue.items.length + spawnTasks.size. However, spawnTasks includes tasks for processes that have already been consumed from the queue by launchJob() and are currently running jobs (still awaiting proc.join() at proc_pool.ts:115). This causes double-counting: idle processes appear both in items.length and spawnTasks, and running processes appear in spawnTasks despite no longer being idle. As a result, toSpawn = target - currentPending is systematically too low, and the pool never spawns replacement idle processes after jobs consume them.

Concrete scenario showing the bug

With numIdleProcesses=4, after 4 processes are warmed: queue=4, spawnTasks=4 (waiting on join). After 4 jobs consume all processes: queue=0, spawnTasks=4. A 5th job calls launchJob() where spawnTasks.size (4) < jobsWaitingForProcess (1) is false, so no demand spawn occurs. The run() loop computes currentPending=4, target=4, toSpawn=0. The 5th job blocks on warmedProcQueue.get() indefinitely until a prior job finishes — a regression from the old MultiMutex approach where entry.unlock() immediately freed a slot.

Prompt for agents
The root cause is that spawnTasks includes tasks for processes currently running jobs (they stay in spawnTasks until proc.join() resolves), so spawnTasks.size conflates initializing processes, idle processes, and running-job processes.

In the old code, the MultiMutex slot was released via entry.unlock() the moment a process was consumed from the queue, immediately allowing a replacement to be spawned. The new code needs an equivalent mechanism.

Possible approaches:
1. Track a separate counter or set for processes that are actively running jobs (post-consumption). Subtract those from currentPending or exclude them from spawnTasks.
2. Remove the spawn task from spawnTasks when its process is consumed from warmedProcQueue, and track the running-job lifecycle separately.
3. Use a different signal (like the queue's consumption event) to trigger replacement spawning, rather than relying on polling with spawnTasks.size.

The fix should also update the demand-driven check in launchJob() (line 69) which has the same overcounting issue: spawnTasks.size < this.jobsWaitingForProcess should compare against tasks that haven't yet produced a consumable process, not all spawn tasks.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@KrishnaShuk KrishnaShuk marked this pull request as draft May 15, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant