LoRA bridge & merge for Qwen3.5#2736
Conversation
- handle HF base names without .weight for LoRA suffixing - add GDN in-proj split logic for fused adapters - support packed expert LoRA stacking in streaming Signed-off-by: Hollow Man <hollowman@opensuse.org>
📝 WalkthroughWalkthroughThe changes extend LoRA/PEFT conversion logic to support GDN in_proj weight handling by introducing splitting and merging of fused Megatron tensors into per-base HF weight components (qkv, z, b, a) during streaming and merging operations, alongside comprehensive test coverage for these conversion paths. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/megatron/bridge/models/conversion/peft_bridge.py (3)
730-731: Defensive check is appropriate but consider logging for debuggability.The
continuesilently skips when expert tensors are unexpectedly empty. While this prevents crashes, it could mask configuration issues in production.🔧 Add debug logging
if not per_expert_linear_in or not per_expert_linear_out: + # Unexpected: packed expert mode but no expert tensors found continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/conversion/peft_bridge.py` around lines 730 - 731, The silent continue should log missing expert tensors for debuggability: inside the loop where the check uses per_expert_linear_in and per_expert_linear_out (in peft_bridge.py), replace the silent continue with a debug or warning log that includes which tensor(s) are empty and contextual identifiers (e.g., layer name, expert index, or the surrounding module/variable names available in that scope) and then continue; ensure the log uses the existing logger (or add one) and includes values or shapes of per_expert_linear_in and per_expert_linear_out to aid troubleshooting.
234-240: Strengthen detection by requiring all four GDN components.The current check verifies the required tokens exist across any names, but doesn't ensure each token appears exactly once. This could produce false positives if a single name contains multiple component substrings.
🔧 Suggested refinement
def _is_gdn_in_proj_split(self, hf_weight_names: Iterable[str]) -> bool: """Check whether the provided HF names correspond to split GDN in_proj weights.""" names = list(hf_weight_names) + if len(names) != 4: + return False required = {"in_proj_qkv", "in_proj_z", "in_proj_b", "in_proj_a"} discovered = {token for name in names for token in required if token in name} return discovered == required and all("linear_attn" in name for name in names)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/conversion/peft_bridge.py` around lines 234 - 240, The _is_gdn_in_proj_split check should require each GDN component token to appear exactly once and not be co-located in a single name; change the logic in _is_gdn_in_proj_split to count occurrences of each token in required = {"in_proj_qkv","in_proj_z","in_proj_b","in_proj_a"} across hf_weight_names and return True only if every token’s count == 1, no single name contains more than one required token, and all names include "linear_attn". This ensures all four components are present exactly once and avoids false positives when multiple tokens appear in one weight name.
754-754: Assertion could be more informative for debugging.If this assertion fails in production, the error message won't indicate which base name or adapter caused the issue.
🔧 Enhanced assertion message
- assert per_base is not None, "Expected fused adapter split for expert LoRA" + assert per_base is not None, ( + f"Expected fused adapter split for expert LoRA, " + f"base_prefix={adapter_task.global_base_prefix}" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/megatron/bridge/models/conversion/peft_bridge.py` at line 754, The assertion "assert per_base is not None, 'Expected fused adapter split for expert LoRA'" is not informative; update the check around per_base in peft_bridge.py (the code that handles fused adapter split for expert LoRA) to raise a clearer error or assertion that includes identifying context such as the base name and adapter name (e.g., include variables like base_name, adapter_name, or the key used to look up per_base) so the message reads something like "Expected fused adapter split for expert LoRA: missing per_base for base '<base_name>' adapter '<adapter_name>'"; locate the assertion near per_base usage in the function handling fused adapter split and replace it with an assertion or ValueError that interpolates those identifiers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/megatron/bridge/models/conversion/peft_bridge.py`:
- Around line 730-731: The silent continue should log missing expert tensors for
debuggability: inside the loop where the check uses per_expert_linear_in and
per_expert_linear_out (in peft_bridge.py), replace the silent continue with a
debug or warning log that includes which tensor(s) are empty and contextual
identifiers (e.g., layer name, expert index, or the surrounding module/variable
names available in that scope) and then continue; ensure the log uses the
existing logger (or add one) and includes values or shapes of
per_expert_linear_in and per_expert_linear_out to aid troubleshooting.
- Around line 234-240: The _is_gdn_in_proj_split check should require each GDN
component token to appear exactly once and not be co-located in a single name;
change the logic in _is_gdn_in_proj_split to count occurrences of each token in
required = {"in_proj_qkv","in_proj_z","in_proj_b","in_proj_a"} across
hf_weight_names and return True only if every token’s count == 1, no single name
contains more than one required token, and all names include "linear_attn". This
ensures all four components are present exactly once and avoids false positives
when multiple tokens appear in one weight name.
- Line 754: The assertion "assert per_base is not None, 'Expected fused adapter
split for expert LoRA'" is not informative; update the check around per_base in
peft_bridge.py (the code that handles fused adapter split for expert LoRA) to
raise a clearer error or assertion that includes identifying context such as the
base name and adapter name (e.g., include variables like base_name,
adapter_name, or the key used to look up per_base) so the message reads
something like "Expected fused adapter split for expert LoRA: missing per_base
for base '<base_name>' adapter '<adapter_name>'"; locate the assertion near
per_base usage in the function handling fused adapter split and replace it with
an assertion or ValueError that interpolates those identifiers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0fba1075-a512-42a1-9585-94465f21abe3
📒 Files selected for processing (2)
src/megatron/bridge/models/conversion/peft_bridge.pytests/unit_tests/models/test_model_bridge_lora.py
There was a problem hiding this comment.
Pull request overview
Adds Qwen3.5-specific LoRA handling to the Megatron↔HF bridge, covering HF parameter naming that omits .weight, GDN in-proj fused/split adapter behavior, and packed-expert LoRA stacking during streaming export.
Changes:
- Allow LoRA param naming/resolution when HF base names don’t end in
.weight(notably Qwen3.5 MoE expert params). - Add GDN in-proj fused-adapter split/merge support (split in streaming; merge path for HF export).
- Add streaming support for “packed expert” LoRA weights (stacked along expert dim) when HF expert names don’t include
experts.N.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/megatron/bridge/models/conversion/peft_bridge.py |
Extends adapter name resolution, adds GDN in-proj split/merge paths, and implements packed-expert LoRA streaming logic. |
tests/unit_tests/models/test_model_bridge_lora.py |
Adds unit tests for missing .weight suffix cases, packed expert streaming, grouped-expert merge edge case, and GDN split roundtrip. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Signed-off-by: Hollow Man <hollowman@opensuse.org>
What does this PR do ?
Support LoRA bridge & merge for Qwen3.5
Tested with target_modules='["in_proj","out_proj","linear_proj","linear_fc1","linear_fc2","router"]'
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit
New Features
Tests