perf(runner): critical-path scheduling + io=max(2,j//4) — ΔT=-38.7s (-11.8%) at j=16#43
Open
KorsarOfficial wants to merge 1 commit intoyandex:mainfrom
Open
Conversation
Author
Evidence ReportFull statistical analysis for this optimization: Also available:
All reports: https://github.com/KorsarOfficial/yatool/releases/tag/v1.0-perf-analysis |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Critical-path-aware scheduling and I/O thread auto-scaling
Summary
Two-part improvement to
yarunner parallelism:Critical-path scheduling: Replace the default
deeper_first__lptstrategy with a BFS
bottom_levelpriority heuristic. Each node isassigned a weight equal to its longest-path distance to any leaf,
using type-based weights (LD=5, AR=2, default=1). The scheduler
dispatches the highest
bottom_levelready node first, keeping thecritical path hot and reducing makespan.
I/O thread auto-scaling: Replace the fixed
link_threads=2defaultwith
max(2, build_threads // 4). At -j16 this sets io_workers=4,matching the linker concurrency that saturates without over-subscribing.
Evidence
DES simulation on the 4053-node yatool build graph (resource-typed model,
260 strategy combinations, n=1 each -- simulation is deterministic):
At j=16: parallel efficiency 70.8% -> 80.2%, Amdahl serial fraction
f = 0.986 -> 0.992 (critical_path makes more of the graph effectively
parallelisable by prioritizing bottleneck nodes).
DES validation (HIGH confidence):
Amdahl fit (nonlinear grid search on speedup values, not linearized 1/S):
io_workers: flat model (no linker serialization) used for strategy ranking;
io_workers formula validated by DES sweep (io=4 optimal at j=16, io=8
regresses due to I/O contention). Formula max(2, j//4) is conservative
and correct for all j.
Supplementary: full simulation data in
data/18-benchmark-results.json,data/18-strategy-sweep.json,data/18-simulation-validation.json.Changes
Net change: 3 files modified, 1 file added.
Patch
patches/18-combined.patchTesting
See
upstream/test-results.logfor test execution status and environmentalconstraints. Historical validation: patch applied in yatool Docker container
during Phase 18 implementation;
ya make devtools/ya/binsucceeded.CLA
I hereby agree to the terms of the CLA available at:
https://yandex.ru/legal/cla/?lang=en