feat(lora): fix LoRA weight-sync deadlock and add stable LoRA RL recipe by WWWjiahui · Pull Request #20 · Infini-AI-Lab/astraflow

WWWjiahui · 2026-06-18T08:34:36Z

Summary

Makes LoRA RL train end-to-end. Two independent fixes: a weight-sync deadlock in the RaaS engine, and a training recipe that keeps LoRA from collapsing. Validated on Qwen3-1.7B / math with a clean rising eval curve.

1. Weight-sync

On every training step the freshly trained LoRA adapter is pushed to the SGLang inference server. The swap follows a fixed sequence so that no request ever runs against half-updated weights or stale cache:

Pause — the server stops accepting new requests (/pause_generation).
Drain / abort — it waits out a short grace period for in-flight requests to finish, aborting any still running, so nothing is mid-generation on the old adapter.
Load — the new adapter is loaded onto every server.
Flush — the KV cache is flushed to drop entries computed with the old adapter's weights.
Resume — generation continues; new requests now use the new adapter.

The bug. The old code reloaded the adapter under a fixed name (lora_1), which required unloading the previous one before step 3. SGLang's /unload_lora_adapter calls wait_for_unload, which blocks until the adapter's usage counter reaches zero, But the requests aborted during the drain never released their hold on that counter, so it stayed non-zero and the unload hung forever, freezing the entire pipeline. This only struck LoRA runs under load (full fine-tuning takes a different, unload-free path), which made it look intermittent.

The fix. Never unload. Each sync loads the new adapter under a fresh versioned name (lora_v1, lora_v2, …) and lets SGLang's memory-pool LRU reclaim the stale ones. The deadlock-prone unload is removed entirely, while the pause → drain/abort → load → flush → resume sequence above is unchanged, so the swap stays correct. A lingering old adapter only costs GPU memory, which is bounded by max_loras_per_batch / max_loaded_loras.

Validation

Qwen3-1.7B, math DP-scaled SGLang: clean monotonic rise over the first ~100 steps with importance weight pinned at 1.0 throughout —

metric	base	step 100
`eval-avg/math500/avg@4`	70.0	77.55
`eval-avg/overall_avg`	30.2	36.58

Make LoRA RL (FSDP2 trainer + SGLang inference) train end-to-end. Weight-sync deadlock (RaaS engine): - Each sync used to unload+reload the adapter under a fixed name (lora_1). SGLang's /unload_lora_adapter blocks in wait_for_unload until every in-flight request releases the adapter's usage counter; paused/aborted requests at a sync never release it, so the unload hangs forever and the pipeline deadlocks. - Fix: load each sync under a fresh versioned name (lora_v{seq}) and never unload; SGLang's mem-pool LRU evicts stale adapters. Track the active name (_current_lora_name), thread it through generation requests, and mirror it to the eval engines that share the server. Stable LoRA RL recipe (examples/math/qwen3-1.7b-m2po-2gpus-lora): - LoRA's alpha/rank scaling makes each weight update much larger than full-FT at the same lr, so the off-policy / multi-minibatch settings full-FT tolerates collapse the policy under LoRA. Use near-on-policy: ppo_n_minibatches=1, max_staleness=1, recompute_logprob=true (lr 5e-6). - Validated: clean rising eval, math500 avg@4 70 -> 77.5, overall 30 -> 36.6.

Add a LoRA subsection to the math recipes doc: the 2-GPU LoRA variant, the near-on-policy settings LoRA needs (ppo_n_minibatches=1, max_staleness=1, recompute_logprob=true), and the pause/drain/abort/load sequence used to swap the adapter on each weight sync.

… add stable LoRA RL recipe

… deadlock at its source PR Infini-AI-Lab#20 avoided the LoRA weight-sync deadlock by loading each step's adapter under a fresh versioned name and never unloading, letting SGLang's LRU reclaim old ones. That only DEFERS the deadlock: SGLang's LoRARegistry LRU eviction (fires once num_registered > max_loaded_loras) calls wait_for_unload, which blocks on a per-adapter usage counter that aborted requests leak. Root cause (SGLang tokenizer_manager): the LoRARegistry counter is acquire()d per request but release()d only on normal completion and one scheduler-abort branch -- never on client-disconnect / waiting-queue aborts, which the RaaS per-step drain routinely creates. The leaked counter never reaches zero, so wait_for_unload (both /unload_lora_adapter and the max_loaded_loras eviction) hangs forever while holding lora_update_lock, freezing all further LoRA ops. Fix: LoRACounterLeakPatch wraps TokenizerManager.generate_request with a finally that releases the counter on every teardown path, idempotently -- release iff the rid is still in rid_to_state (atomic check-then-pop; double-release is fatal since ConcurrentCounter has no floor). Delivered via the existing SGLang monkey-patch framework (no rebuild); gated by ASTRAFLOW_DISABLE_LORA_LEAK_FIX. The versioned-name scheme is kept (harmless once the leak is fixed); the now-false "deadlock" comments in remote_inf_engine.py and raas.yaml are corrected. Verified A/B at max_loaded_loras=4 on AMD/ROCm: control (fix off) deadlocks at /load_lora_adapter(lora_v5) evicting lora_v1; treatment (fix on) completes all weight-syncs and finishes. Unit test astraflow/raas/patch/tests/test_lora_counter_leak.py covers release idempotency and the double-release guard (4/4 pass). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

WWWjiahui requested a review from haizhongzheng as a code owner June 18, 2026 08:34

WWWjiahui force-pushed the chore/bump-sglang-0.5.12 branch from 5f9c0a1 to cbd7f97 Compare June 19, 2026 06:38

WWWjiahui changed the base branch from dev to main June 19, 2026 07:12

haizhongzheng added a commit that referenced this pull request Jun 19, 2026

Merge pull request #20: feat(lora): fix LoRA weight-sync deadlock and…

227d52f

… add stable LoRA RL recipe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(lora): fix LoRA weight-sync deadlock and add stable LoRA RL recipe#20

feat(lora): fix LoRA weight-sync deadlock and add stable LoRA RL recipe#20
WWWjiahui wants to merge 3 commits into
Infini-AI-Lab:mainfrom
WWWjiahui:chore/bump-sglang-0.5.12

WWWjiahui commented Jun 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

WWWjiahui commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Weight-sync

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

WWWjiahui commented Jun 18, 2026 •

edited

Loading