Fix performance regression in grad_handling_hook by rraminen · Pull Request #7882 · deepspeedai/DeepSpeed

rraminen · 2026-03-02T21:50:22Z

This PR fixes a performance drop introduced by calling count_used_parameters_in_backward() inside every gradient hook.

In the previous implementation, the expected hook count was computed once per backward phase. After a recent PR changes (311674f#diff-99dcf26ea2876ff5bbf05b5165c4133eaa0d0f36b170685643c2f7e2eb566addL1002-L1010), it is being recomputed on every hook invocation, resulting in a drop in samples/sec values.

With the fix in this PR performance returns to pre-regression samples/sec values.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2560cf45a9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-02T21:54:27Z

deepspeed/runtime/zero/stage_1_and_2.py

+                            if self._hooks_fired_this_backward == 0:
+                                self.current_expected_hooks = count_used_parameters_in_backward(all_params_requiring_grad)
+                            self.update_hook_state_and_maybe_run_epilogue(self.current_expected_hooks)


Recompute expected hook count for each backward phase

Caching count_used_parameters_in_backward() only when _hooks_fired_this_backward == 0 makes current_expected_hooks fixed to the first phase’s value, but reentrant checkpointing can introduce additional participating params in later phases. In that case update_hook_state_and_maybe_run_epilogue() never sees the higher expected count, so remaining_grad_acc_hooks can drop to zero too early and run the epilogue before all gradients for the backward are processed, which risks incomplete reduction/accumulation for checkpointed models.

Useful? React with 👍 / 👎.

rraminen · 2026-03-03T21:35:15Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 51bec2ba76

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-03T21:38:19Z

deepspeed/runtime/zero/stage_1_and_2.py

+                            if self._remaining_grad_acc_hooks == 0:
+                                self.current_expected_hooks = count_used_parameters_in_backward(all_params_requiring_grad)
+                            self.update_hook_state_and_maybe_run_epilogue(self.current_expected_hooks)
+                            self._remaining_grad_acc_hooks -= 1


Stop decrementing hook-state counter manually

update_hook_state_and_maybe_run_epilogue() already computes and stores _remaining_grad_acc_hooks, so subtracting again here can make the counter negative after the last hook. In environments that hit the fallback path (e.g., callback API unavailable or hooks firing with backward_active_depth==0), the next backward then skips the == 0 branch and reuses stale current_expected_hooks, and reenter_backward_if_needed() no longer sees the zero sentinel it relies on; this can mis-track expected hooks and prevent the epilogue from running at the correct time for reentrant/dynamic backward graphs.

Useful? React with 👍 / 👎.

tohtana · 2026-03-04T00:21:03Z

Hi @rraminen,
Thank you for the PR! The change introduced in #7780 definitely adds some overhead, but we didn’t expect it to be this significant.
Could you share the performance numbers from before and after the change? If you can also share a repro, we’ll look into ways to improve the performance and mitigate the impact.

Fix grad_handling_hook

2560cf4

rraminen requested review from tjruwase and tohtana as code owners March 2, 2026 21:50

chatgpt-codex-connector bot reviewed Mar 2, 2026

View reviewed changes

Update if condition

51bec2b

chatgpt-codex-connector bot reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix performance regression in grad_handling_hook#7882

Fix performance regression in grad_handling_hook#7882
rraminen wants to merge 2 commits intodeepspeedai:masterfrom
rraminen:fix_gradient_hook

rraminen commented Mar 2, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 2, 2026

Uh oh!

rraminen commented Mar 3, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Uh oh!

tohtana commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rraminen commented Mar 2, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

rraminen commented Mar 3, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

tohtana commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants