Support load FP8 model on HPU by yiliu30 · Pull Request #1449 · intel/auto-round

yiliu30 · 2026-02-12T08:55:44Z

Description

Please briefly describe your main changes, the motivation.

Type of Change

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

Signed-off-by: yiliu30 <yi4.liu@intel.com>

for more information, see https://pre-commit.ci

Copilot

Pull request overview

Enable loading FP8 models on Habana HPU by faking CUDA capability checks during from_pretrained, and add an HPU-focused FP8 quantization test.

Changes:

Add a context manager to temporarily report CUDA availability on HPU and override CUDA device capability checks.
Wrap model loading with these context managers when HPEx is available to support FP8 model load on HPU.
Add an HPU test validating FP8 quantization output weights and basic numerics.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File	Description
test/test_hpu/test_quant_fp8.py	New HPU test that quantizes small models to FP8 and validates dtype + NaN/Inf.
auto_round/utils/model.py	Wrapes `from_pretrained` with HPU/CUDA-override contexts to enable FP8 loading on HPU.
auto_round/utils/device.py	Introduces `fake_cuda_for_hpu` context manager to temporarily force `torch.cuda.is_available()` true on HPU.
auto_round/compressors/base.py	Removes HPU-specific exclusion of FP8 layers to allow FP8 on HPU.

Copilot · 2026-02-12T08:57:49Z

test/test_hpu/test_quant_fp8.py

+    def test_small_model_rtn_generation(self, model_name):
+        ar = AutoRound(model_name, iters=0, scheme="FP8_STATIC", nsamples=16)
+        model, folder = ar.quantize_and_save(output_dir=self.save_dir, format="llm_compressor")


This test will likely fail in environments without HPU/HPEx because it unconditionally runs and attempts an FP8/HPU-specific flow. Add a skipif (or importorskip) guard so the test only runs when the HPU runtime is available (e.g., based on is_hpex_available() / HPU availability).

Copilot · 2026-02-12T08:57:49Z

test/test_hpu/test_quant_fp8.py

@@ -0,0 +1,35 @@
+import os


os is imported but not used in this new test file. Please remove it to keep the test minimal and avoid lint warnings.

Suggested change

import os

Copilot · 2026-02-12T08:57:50Z

auto_round/utils/model.py

-            trust_remote_code=trust_remote_code,
-            device_map="auto" if use_auto_mapping else None,
-        )
+    if is_hpex_available():


The CUDA-faking/capability-override is applied whenever HPEx is available, regardless of the selected device_str. This can unintentionally alter load-time behavior for non-HPU runs on machines that have HPEx installed. Consider additionally gating this block on device_str (e.g., only apply when loading for HPU) so other device paths aren’t affected.

Suggested change

if is_hpex_available():

if is_hpex_available() and device_str is not None and "hpu" in device_str:

Copilot · 2026-02-12T08:57:50Z

auto_round/utils/device.py

@@ -339,6 +339,25 @@ def __exit__(self, exc_type, exc, exc_tb):
        return False




This is a class but is named like a function (lower_snake_case). For clarity and consistency, consider either renaming it to a CapWords class name (e.g., FakeCudaForHpu) or converting it into a @contextmanager function named fake_cuda_for_hpu.

Copilot · 2026-02-12T08:57:50Z

auto_round/utils/device.py

+        if is_hpex_available():
+            self._orig_is_available = torch.cuda.is_available


This mutates a global function (torch.cuda.is_available) process-wide, which can cause surprising behavior if other threads/tasks call CUDA checks while this context is active. If possible, prefer a safer patching approach (e.g., unittest.mock.patch scoped to the smallest block) and keep the patched window as short as possible.

Signed-off-by: yiliu30 <yi4.liu@intel.com>

for more information, see https://pre-commit.ci

Signed-off-by: yiliu30 <yi4.liu@intel.com>

for more information, see https://pre-commit.ci

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 added 5 commits February 12, 2026 07:40

fake cuda on hpu

b5f7489

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

d8a9849

Signed-off-by: yiliu30 <yi4.liu@intel.com>

load fp8 model

6332241

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add more ut

b6b6222

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add more ut

0918cb4

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Copilot AI review requested due to automatic review settings February 12, 2026 08:55

yiliu30 changed the title ~~Support load FP8 on HPU~~ Support load FP8 model on HPU Feb 12, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

251be94

for more information, see https://pre-commit.ci

Copilot AI reviewed Feb 12, 2026

View reviewed changes

Merge branch 'main' into hpu-fp8

a892fc6

yiliu30 added the hpu label Feb 12, 2026

yiliu30 and others added 6 commits February 13, 2026 02:42

patch fp8 quantized for hpu

73f93b2

Signed-off-by: yiliu30 <yi4.liu@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

581c993

for more information, see https://pre-commit.ci

fix format

84c1f1e

Signed-off-by: yiliu30 <yi4.liu@intel.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

906353d

for more information, see https://pre-commit.ci

fix

5650385

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Merge branch 'main' into hpu-fp8

ca24a68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support load FP8 model on HPU#1449

Support load FP8 model on HPU#1449
yiliu30 wants to merge 13 commits intomainfrom
hpu-fp8

yiliu30 commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if is_hpex_available():
	if is_hpex_available() and device_str is not None and "hpu" in device_str:

		@@ -339,6 +339,25 @@ def __exit__(self, exc_type, exc, exc_tb):
		return False

		if is_hpex_available():
		self._orig_is_available = torch.cuda.is_available

Conversation

yiliu30 commented Feb 12, 2026

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant