Conversation
Correctly handle `ds_grad_is_ready` in ZeRO2 --------- Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
|
To use Codex here, create a Codex account and connect to github. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fd07c93a5e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cabfebcdca
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if self.return_router_logits: | ||
| logits = self._cached_router_logits | ||
| self._cached_router_logits = None |
There was a problem hiding this comment.
Populate router logits when returning tuple output
When _detect_forward_contract sets return_router_logits=True for legacy MoE blocks (router_logits_capture_target == "moe_block"), _register_logit_hook is not installed and _cached_router_logits is never set. The forward path then returns (output, None) here, which breaks callers that expect actual router logits (e.g., OutputRecorder/z-loss paths that rely on the second return value). This only shows up for models using the MoE-block tuple contract, but in that case the logits are silently missing.
Useful? React with 👍 / 👎.
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Add AutoEP
@codex