Add `NVTE_BACKWARD_MODE=default|unquant|dequant` by zianglih · Pull Request #2644 · NVIDIA/TransformerEngine

zianglih · 2026-02-03T00:48:37Z

Description

@HumansAnd

~~Add an NVTE_KEEP_BACKWARD_UNQUANTIZED env var for quantized fprop + high precision wgrad & dgrad.~~

Add NVTE_BACKWARD_MODE=default|unquant|dequant env var:

default: existing default quantization behavior
unquant: quantized fprop + high precision wgrad & dgrad using unquantized activation and weight
dequant: quantized fpop + high precision wgrad & dgrad using activation and weight dequantized directly from fprop quantized value

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

greptile-apps · 2026-02-03T00:52:31Z

Greptile Summary

This PR introduces the NVTE_BACKWARD_MODE environment variable with three modes (default, unquant, dequant) to control backward pass quantization behavior in Linear, LayerNormLinear, and GroupedLinear modules. The implementation is sound:

Backward mode propagation: Correctly reads from recipe and environment, defaults to "default"
Tensor saving: For "unquant" mode, saves original high-precision tensors; for "dequant", saves quantized tensors with proper dequantization in backward via maybe_dequantize()
Context flags: Properly sets ctx.with_quantized_compute = False for non-default modes, which triggers the high-precision backward paths
Module coverage: Linear, LayerNormLinear, and GroupedLinear all implement the feature; LayerNormMLP explicitly blocks non-default modes with clear error message
Quantizer configuration: Correctly disables columnwise quantization for input/grad_output when using non-default modes, and disables optimize_for_gemm for MXFP8/NVFP4 in dequant mode
Test coverage: Comprehensive test suite in test_backward_mode.py validates both unquant and dequant modes across recipe types

The feature is properly gated and safe to merge.

Confidence Score: 5/5

Safe to merge - implementation is correct with proper guards and dequantization logic
The prior review flagged three concerns, but detailed code analysis shows all three were based on incorrect analysis: (1) get_fp8_recipe() at line 336 is properly guarded by is_fp8_enabled() check from line 330; (2) get_fp8_recipe() never returns None - the method returns either stored recipe or default recipe, with return type Recipe not Optional[Recipe]; (3) Dequantization for wgrad in dequant mode does occur via maybe_dequantize() call at line 828 in _functional_backward when with_quantized_compute=False. The implementation correctly handles backward mode propagation, tensor saving/loading, context flags, and quantizer configuration across all supported modules. Test coverage is comprehensive.
No files require special attention

_{Last reviewed commit: 1edb2c3}

greptile-apps

_{17 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{6 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

zianglih · 2026-02-03T08:31:14Z

I'll work on potential unit test breakage.

transformer_engine/pytorch/ops/fuser.py

transformer_engine/pytorch/module/layernorm_linear.py

greptile-apps

_{5 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{5 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/pytorch/quantization.py

transformer_engine/pytorch/module/linear.py

transformer_engine/pytorch/module/layernorm_linear.py

transformer_engine/pytorch/module/layernorm_mlp.py

greptile-apps

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/pytorch/module/linear.py

greptile-apps

_{5 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/pytorch/quantization.py

greptile-apps

_{5 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/pytorch/module/layernorm_mlp.py

transformer_engine/pytorch/module/layernorm_linear.py

greptile-apps

_{4 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{5 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{5 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{5 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

transformer_engine/pytorch/module/linear.py

zhongbozhu · 2026-02-03T22:30:02Z

transformer_engine/pytorch/module/linear.py

            # Note: dgrad GEMM requires row-wise usage, wgrad GEMM
            # requires column-wise usage
-            if ctx.grad_output_quantizer is not None:
+            if ctx.grad_output_quantizer is not None and use_fp8_bwd:


this line seems redundant since you already skip the quantization step in base.py grad_output_preprocess?

zhongbozhu · 2026-02-03T22:30:27Z

transformer_engine/pytorch/module/linear.py

                not ctx.use_bias
                and not ctx.requires_wgrad
                and ctx.grad_output_quantizer is not None
+                and use_fp8_bwd


same comment as above

transformer_engine/pytorch/module/grouped_linear.py

zhongbozhu · 2026-02-03T22:39:21Z

transformer_engine/pytorch/quantization.py

+        recipe = cls.get_fp8_recipe()
+        if recipe is not None and recipe.delayed():
+            # Ignore NVTE_KEEP_BACKWARD_UNQUANTIZED when delayed scaling is used
+            return False


Maybe it's better to assert an error for delayed scaling? Okay with both.

I agree. If the user specifies an unsupported combination, I think it's better to fail loudly than to secretly disobey their instructions.

transformer_engine/pytorch/module/layernorm_linear.py

zhongbozhu · 2026-02-03T23:41:16Z

transformer_engine/pytorch/module/layernorm_linear.py

            # Note: dgrad GEMM requires row-wise usage, wgrad GEMM
            # requires column-wise usage
-            if ctx.grad_output_quantizer is not None:
+            if ctx.grad_output_quantizer is not None and use_fp8_bwd:


this seems redundant too if we skip quant in grad_output_preprocess

transformer_engine/pytorch/module/layernorm_linear.py

transformer_engine/pytorch/module/layernorm_mlp.py

Signed-off-by: Ziang Li <ziangli@umich.edu>

for more information, see https://pre-commit.ci

Signed-off-by: Ziang Li <ziangli@umich.edu>

for more information, see https://pre-commit.ci

Signed-off-by: Ziang Li <ziangli@umich.edu>

for more information, see https://pre-commit.ci

Signed-off-by: Ziang Li <ziangli@umich.edu>

for more information, see https://pre-commit.ci

Signed-off-by: Ziang Li <ziangli@umich.edu>

for more information, see https://pre-commit.ci

Signed-off-by: Ziang Li <ziangli@umich.edu>

for more information, see https://pre-commit.ci

Signed-off-by: Ziang Li <ziangli@umich.edu>

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

zhongbozhu reviewed Feb 3, 2026

View reviewed changes

transformer_engine/pytorch/ops/fuser.py Outdated Show resolved Hide resolved

ziang-and force-pushed the keep-bwd branch from 539af7d to 3e6eb64 Compare February 3, 2026 08:58

zhongbozhu reviewed Feb 3, 2026

View reviewed changes

transformer_engine/pytorch/module/layernorm_linear.py Outdated Show resolved Hide resolved

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

transformer_engine/pytorch/module/linear.py Outdated Show resolved Hide resolved

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

transformer_engine/pytorch/quantization.py Outdated Show resolved Hide resolved

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

transformer_engine/pytorch/module/layernorm_mlp.py Outdated Show resolved Hide resolved

transformer_engine/pytorch/module/layernorm_linear.py Outdated Show resolved Hide resolved

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

ziang-and force-pushed the keep-bwd branch from c934298 to b449fc4 Compare February 3, 2026 19:51

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

zhongbozhu reviewed Feb 3, 2026

View reviewed changes

transformer_engine/pytorch/module/linear.py Outdated Show resolved Hide resolved

zhongbozhu reviewed Feb 3, 2026

View reviewed changes

transformer_engine/pytorch/module/grouped_linear.py Outdated Show resolved Hide resolved

zhongbozhu reviewed Feb 3, 2026

View reviewed changes

transformer_engine/pytorch/module/layernorm_linear.py Outdated Show resolved Hide resolved

zhongbozhu reviewed Feb 3, 2026

View reviewed changes

transformer_engine/pytorch/module/layernorm_linear.py Outdated Show resolved Hide resolved

zhongbozhu reviewed Feb 3, 2026

View reviewed changes

transformer_engine/pytorch/module/layernorm_linear.py Outdated Show resolved Hide resolved

zhongbozhu reviewed Feb 3, 2026

View reviewed changes

transformer_engine/pytorch/module/layernorm_mlp.py Outdated Show resolved Hide resolved

zhongbozhu reviewed Feb 4, 2026

View reviewed changes

transformer_engine/pytorch/module/layernorm_mlp.py Outdated Show resolved Hide resolved

zhongbozhu reviewed Feb 4, 2026

View reviewed changes

transformer_engine/pytorch/module/layernorm_mlp.py Outdated Show resolved Hide resolved

zhongbozhu reviewed Feb 4, 2026

View reviewed changes

transformer_engine/pytorch/module/layernorm_mlp.py Outdated Show resolved Hide resolved

zianglih and others added 26 commits March 5, 2026 12:56

Drop and disallow LayerNormMLP implementation

4627465

Signed-off-by: Ziang Li <ziangli@umich.edu>

[pre-commit.ci] auto fixes from pre-commit.com hooks

067bd0e

for more information, see https://pre-commit.ci

Move interface changes to recipe

3ad5e02

Signed-off-by: Ziang Li <ziangli@umich.edu>

[pre-commit.ci] auto fixes from pre-commit.com hooks

8b037a6

for more information, see https://pre-commit.ci

Move ub overrides to fwd

6d51788

Signed-off-by: Ziang Li <ziangli@umich.edu>

Remove duplication

7f9d7fe

Signed-off-by: Ziang Li <ziangli@umich.edu>

Simplify use_fp8_bwd logic in bwd

b750890

Signed-off-by: Ziang Li <ziangli@umich.edu>

[pre-commit.ci] auto fixes from pre-commit.com hooks

ec8cb53

for more information, see https://pre-commit.ci

Set grad quantizers to none if keep bwd unquantized

1d0ba9d

Signed-off-by: Ziang Li <ziangli@umich.edu>

[pre-commit.ci] auto fixes from pre-commit.com hooks

a344da6

for more information, see https://pre-commit.ci

Drop delayed scaling change

90e9706

Signed-off-by: Ziang Li <ziangli@umich.edu>

Simplify env var logic

645b80f

Signed-off-by: Ziang Li <ziangli@umich.edu>

Move validation check to recipe

3b36235

Signed-off-by: Ziang Li <ziangli@umich.edu>

Simplify effective_enabled

44dfb91

Signed-off-by: Ziang Li <ziangli@umich.edu>

Fix inverted assertion logic

675037e

Signed-off-by: Ziang Li <ziangli@umich.edu>

Simplify changes under ops

530b421

Signed-off-by: Ziang Li <ziangli@umich.edu>

[pre-commit.ci] auto fixes from pre-commit.com hooks

33d52ac

for more information, see https://pre-commit.ci

Simplify ctx.keep_backward_unquantized

1167b2e

Signed-off-by: Ziang Li <ziangli@umich.edu>

Fix missing attribute

ac6f77e

Signed-off-by: Ziang Li <ziangli@umich.edu>

Add unit tests

8e5223b

Signed-off-by: Ziang Li <ziangli@umich.edu>

Fix bias errors in unit test

b2bf241

Signed-off-by: Ziang Li <ziangli@umich.edu>

[pre-commit.ci] auto fixes from pre-commit.com hooks

3b37175

for more information, see https://pre-commit.ci

Add more shapes to unit test

188ba19

Signed-off-by: Ziang Li <ziangli@umich.edu>

Refator interface to NVTE_BACKWARD_MODE=default|unquant|dequant

885f127

Signed-off-by: Ziang Li <ziangli@umich.edu>

Fix override and clean up

6006867

Signed-off-by: Ziang Li <ziangli@umich.edu>

Clean up unit test

d5f0195

Signed-off-by: Ziang Li <ziangli@umich.edu>

ziang-and force-pushed the keep-bwd branch from 0dee809 to d5f0195 Compare March 5, 2026 20:58

ptrendx added the community-contribution PRs from external contributor outside the core maintainers, representing community-driven work. label Mar 5, 2026

Clean up unit test

1edb2c3

Signed-off-by: Ziang Li <ziangli@umich.edu>

zianglih marked this pull request as ready for review March 5, 2026 21:58

Conversation

zianglih commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

zianglih commented Feb 3, 2026

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhongbozhu Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

zhongbozhu Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhongbozhu Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

timmoon10 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhongbozhu Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zianglih commented Feb 3, 2026 •

edited

Loading

greptile-apps bot commented Feb 3, 2026 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading