fix: FP8 fallback for AIU addons running on CPU by andrea-fasoli · Pull Request #200 · foundation-model-stack/fms-model-optimizer

andrea-fasoli · 2026-03-19T18:24:18Z

Description of the change

Starting from PyTorch 2.10, torch._scaled_mm no longer supports FP8 matmul on CPU for any quantization scheme other than per-tensor. torch._scaled_mm through a call to addmm_float8_unwrapped_inference is currently called by the FP8 AIU addons when the model runs on CPU.

This PR implements a fallback in this scenario: we perform a mock FP8 x FP8 matmul on CPU using torch.nn.functional.linear between quantized/dequantized activations and dequantized weights. Notice we do not simply dequantize the weights.

Related issues or PRs

[internal issue]

How to verify the PR

Example of a test that should pass, ran on a pod with 4 AIUs, in PF mode, in PyTorch 2.10 env (set up env vars according to your case; AFTU = aiu-fms-testing-utils repo):

torchrun --nproc-per-node 4 ${AFTU_PATH}/scripts/drive_paged_programs.py --model_variant ${FP8_MODEL_PATH} --max_new_tokens 128 --timing per-token --dataset_type sharegpt --dataset_path ${DATASET_PATH} --test_type metrics --program_criteria_json_path ${PROGRAMS_FILE} --programs ${SELECTED_PROGRAM} --attention_type paged_fp8 --save_validation_info_outputs --validation_info_outputs_dir ${OUTPUT_DIR} --prefill_chunk_size 1024 --cross_entropy_threshold 2.6 --failure_rate_threshold 0.1 --prioritize_large_batch_sizes --enforce_homogeneous_prompt_programs --distributed

Was the PR tested

I have ensured all unit tests pass

Checklist for passing CI/CD:

All commits are signed showing "Signed-off-by: Name <email@domain.com>" with git commit -signoff or equivalent
PR title and commit messages adhere to Conventional Commits
Contribution is formatted with pre-commit
Contribution passes all unit tests with tox -e unit

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>

andrea-fasoli · 2026-03-19T18:25:45Z

@ani300 need your eyes on this

gkumbhat · 2026-03-19T18:32:00Z

fms_mo/aiu_addons/fp8/fp8_linear.py

+                # Perform mock FP8xFP8 matmul
+                if is_cpu and not is_per_tensor and not SUPPORTS_CPU_PER_CHANNEL_FP8:
+                    x_dequant = qx.dequantize()
+                    w_dequant = qweight.dequantize()


Do we expect this to affect the quality significantly ?

If anything it'll improve it on cpu

Make sense. Since we use these numbers to compare against accelerator results, this can cause wider deviation between those results? Unless the diff is quite small.

we're downcasting back to fp8 anyways, so it shouldn't be too different.

I would also expect a very minimal discrepancy in terms of generation compared to the earlier operation.

There may be some runtime overhead, as this new fallback is likely less performant than calling torch._scaled_mm. To clarify: potential overheads on CPU validation only, no impact at all on AIU runtime.

ani300

lgtm! the fix makes sense

ani300 · 2026-03-19T18:33:18Z

is it worth adding a test to check if the combination that was failing before works now and in the future?

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>

andrea-fasoli added 2 commits March 19, 2026 13:50

add fallback to mock fp8 matmul on cpu

96e3284

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>

formatting

84d0ac0

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>

andrea-fasoli requested review from BrandonGroth, chichun-charlie-liu, kcirred, nwang-ibm and tharapalanivel as code owners March 19, 2026 18:24

github-actions bot added the fix label Mar 19, 2026

andrea-fasoli requested a review from ani300 March 19, 2026 18:25

gkumbhat reviewed Mar 19, 2026

View reviewed changes

ani300 approved these changes Mar 19, 2026

View reviewed changes

andrea-fasoli added 7 commits March 19, 2026 19:51

fix dtype of fp8 matmul with non-quantized activations

23b10a3

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>

add torchao version check in fallback

8a046a9

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>

add unit tests for FP8 matmul on CPU

c0a617f

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>

minor updates

c283291

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>

minor updates

e158bd2

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>

remove static activation test

ef73576

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>

fix pylint false positive

bc5de3e

Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: FP8 fallback for AIU addons running on CPU#200

fix: FP8 fallback for AIU addons running on CPU#200
andrea-fasoli wants to merge 9 commits intomainfrom
fp8_cpu

andrea-fasoli commented Mar 19, 2026 •

edited

Loading

Uh oh!

andrea-fasoli commented Mar 19, 2026

Uh oh!

gkumbhat Mar 19, 2026

Uh oh!

ani300 Mar 19, 2026

Uh oh!

gkumbhat Mar 19, 2026

Uh oh!

ani300 Mar 19, 2026

Uh oh!

andrea-fasoli Mar 19, 2026

Uh oh!

ani300 left a comment

Uh oh!

ani300 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

andrea-fasoli commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the change

Related issues or PRs

How to verify the PR

Was the PR tested

Checklist for passing CI/CD:

Uh oh!

andrea-fasoli commented Mar 19, 2026

Uh oh!

gkumbhat Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

ani300 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

gkumbhat Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

ani300 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

andrea-fasoli Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

ani300 left a comment

Choose a reason for hiding this comment

Uh oh!

ani300 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andrea-fasoli commented Mar 19, 2026 •

edited

Loading