Skip to content

Fix Stage 0 + Ulysses crash: make bwc_tensor_model_parallel_rank() resilient to MP API absence#7888

Open
nathon-lee wants to merge 8 commits intodeepspeedai:masterfrom
nathon-lee:fix_iss_7833
Open

Fix Stage 0 + Ulysses crash: make bwc_tensor_model_parallel_rank() resilient to MP API absence#7888
nathon-lee wants to merge 8 commits intodeepspeedai:masterfrom
nathon-lee:fix_iss_7833

Conversation

@nathon-lee
Copy link
Contributor

Title

Fix Stage 0 + Ulysses crash: make bwc_tensor_model_parallel_rank() resilient to MP API absence

Summary

This PR fixes a hard crash when using Ulysses sequence parallelism with ZeRO Stage 0 (BF16_Optimizer).
In this configuration, DeepSpeed calls deepspeed.utils.bwc.bwc_tensor_model_parallel_rank(mpu=...), and the passed mpu object can be deepspeed.runtime.sequence_parallel.parallel_state_sp, which does not implement the deprecated get_model_parallel_rank() API. The current fallback path unconditionally calls mpu.get_model_parallel_rank(), raising AttributeError.

The fix adds a defensive capability check before calling the deprecated API. If the provided mpu does not expose any known tensor/model-parallel rank API, we treat it as “no tensor model parallelism” and return rank 0.

Motivation / Context

  • Affected scenario: Ulysses sequence parallel + ZeRO Stage 0
  • Failure mode: AttributeError: ... parallel_state_sp has no attribute get_model_parallel_rank
  • Root cause: bwc_tensor_model_parallel_rank() falls back to a deprecated API without an hasattr() check.

This change keeps the original priority order intact:

  1. get_tensor_model_parallel_rank()
  2. get_slice_parallel_rank()
  3. get_model_parallel_rank() (deprecated)
  4. fallback to 0 if none exist

Changes

  • deepspeed/utils/bwc.py
    • Update bwc_tensor_model_parallel_rank() to check hasattr(mpu, "get_model_parallel_rank") before calling it.
    • If mpu provides none of the expected tensor/model-parallel rank APIs, return 0 (no TP).

Why this is safe

  • For Megatron / DeepSpeed Topology / any existing MPU that already implements get_tensor_model_parallel_rank() or get_slice_parallel_rank() or get_model_parallel_rank(), behavior is unchanged.
  • The new code path only affects the previously-crashing case where the mpu object does not provide any of these methods.

Reproduction

Using the Ulysses ALST tutorial flow, switching ZeRO stage from 3 to 0 triggers the crash during optimizer step when grad norm is computed.

Testing

  • Existing unit tests should continue to pass.
  • Minimal repro: calling bwc_tensor_model_parallel_rank(mpu=deepspeed.runtime.sequence_parallel.parallel_state_sp) should no longer raise.

References

Copilot AI and others added 8 commits February 27, 2026 06:30
This reverts commit ff88670.

Co-authored-by: nathon-lee <248585198+nathon-lee@users.noreply.github.com>
Revert "fix: update 1 file reformatted." (ff88670)
Revert accidental Muon optimizer code re-introduction from copilot PRs
Add check for model parallel rank in mpu.
@nathon-lee nathon-lee changed the title Fix iss 7833 Fix issue 7833 Mar 6, 2026
@nathon-lee nathon-lee changed the title Fix issue 7833 Fix issue #7833 Mar 6, 2026
@nathon-lee nathon-lee changed the title Fix issue #7833 Fix Stage 0 + Ulysses crash: make bwc_tensor_model_parallel_rank() resilient to MP API absence Mar 6, 2026
@tohtana
Copy link
Collaborator

tohtana commented Mar 6, 2026

Hi @nathon-lee,
Thank you for reporting!

I found that we already have a fallback from get_model_parallel_world_size to get_sequence_parallel_world_size.
This was introduced in #7649. Can you make sure that the latest version still raises the error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants