[scheduler/cuebot] Bulk resource accounting#2198
[scheduler/cuebot] Bulk resource accounting#2198DiegoTavares wants to merge 13 commits intoAcademySoftwareFoundation:masterfrom
Conversation
Having multiple frames starting on the same subscription leads to lock contention when trying to update the subscription table. There was already a cache on the scheduler being used for reads, but writes were being dispatched on each frame update. To prevent lock contention, the chache is now updated on each dispatch and a flush happens on each cache update tick (defaults to each 3 seconds). When running multiple instances, this can lead to running slighly above allocation limits, but the recalculate_subs scheduled function and the trigger__verify_subscription should prevent big drifts from happening on the long run. Entire-Checkpoint: 059ff47f5f92
Entire-Checkpoint: 57fcdec6f3b4
Signed-off-by: Diego Tavares <dtavares@imageworks.com>
This change shifts resource accounting (subscription, layer_resource, job_resource, folder_resource, point tables) from incremental delta updates at dispatch/release time to periodic bulk recomputation from the proc table. This affects both the Java cuebot and the Rust scheduler. Key changes: 1. Java (cuebot): Wraps existing incremental resource updates behind a dispatcher.scheduler_manages_resources feature flag 2. Rust (scheduler): Replaces the delta-accumulate-and-flush pattern with periodic recompute_all_from_proc() and recalculate_subs() 3. New ResourceAccountingService: Periodic loop recomputing layer/job/folder/point resource tables 4. Simplified AllocationService: Removes pending_deltas mutex, DeltaKey/DeltaValue types, retry logic, and delta re-application after cache refresh
ef9d907 to
13bdb9f
Compare
|
PR Assessment using Claude Code PR Evaluation: Bulk Resource Accounting EffectivenessContextThis PR replaces per-frame resource accounting updates with periodic bulk recomputation. The What the PR Removes from the Dispatch PathEach frame dispatch previously executed 4-5 inline UPDATEs within the dispatch transaction:
These are now gone from the per-frame transaction. This is the core fix — it eliminates the primary source of row contention that scales with dispatch throughput. What Remains in the Per-Frame Transaction
The layer_stat/job_stat trigger is the remaining hotspot: every frame in the same layer/job contends on these rows. However, Cuebot has the same trigger and handles 10 shows without issues, so this alone isn't the destabilizing factor — it was the combination of trigger contention + resource accounting contention that overwhelmed the database. Effectiveness AssessmentSince
Overall: The PR is effective. It addresses the root cause (high-concurrency dispatch generating O(dispatch_rate) contention on resource tables) and the shorter transactions also reduce lock hold time on layer_stat/job_stat. Remaining Concerns (Minor)1.
|
When a show or show.allocation is being served to the scheduler, only resources for that show should be recomputed on a schedule. Refactor allocation_dao into resource_accounting as both serve a similar purpose.
13bdb9f to
984da29
Compare
This change shifts resource accounting (subscription, layer_resource, job_resource, folder_resource,
point tables) from incremental delta updates at dispatch/release time to periodic bulk re-computation
from the proc table. This affects both the Java Cuebot and the Rust scheduler.
Key changes:
Attention: When
dispatcher.scheduler_manages_resources=true, the scheduler service needs to be active to make sure resources (subscription, layer_resource, job_resource, folder_resource,point tables) are updated from the proc table periodically.
Motivation
Each frame dispatch currently triggers updates across 5 resource-accounting tables, where concurrent dispatches contend for the same rows. This creates lock contention on the database that scales with dispatch volume. During crunch times, this contention has led to instability (deadlocks, slow dispatches, cascading timeouts).