You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[PyTorch] Refactor C++ quantizer infrastructure (NVIDIA#1952)
* remove reciprocal op
Signed-off-by: zhongboz <zhongboz@nvidia.com>
* Refactor Quantizer::create_tensor function
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix bug when constructing FP8 tensor
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Add quantize function to C++ quantizers
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* Prototype function to coerce Python quantized tensors to match quantizer
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* Use quantizer class in tex.quantize
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* Add FP8 current scaling support for activation backward
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* Disable quantized GEMM output with FP8 current scaling
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* Add coerce_tensor functions for MXFP8 and DSv3
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Avoid quantizing empty tensors
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* Use consistent shapes for FP8 transposes
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* In attention impl, construct FP8 tensors with pre-initialized scale-invs
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* Initialize MXFP8 scales to zero
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Store copy of quantizer when creating quantized tensors
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* Fix linter warnings
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* Make sure quantized tensors have private quantizer
Avoid problems with in-place ops after quantizer usages are changed externally.
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* Rename "coerce_tensor" to "convert_and_update_tensor"
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Make sure CUDA context is available when launching NVRTC kernel
Signed-off-by: Tim Moon <tmoon@nvidia.com>
* Expose CUDA context creation function externally
Signed-off-by: Tim Moon <tmoon@nvidia.com>
---------
Signed-off-by: zhongboz <zhongboz@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: zhongboz <zhongboz@nvidia.com>
Co-authored-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
0 commit comments