feat: enable `QASYMM8_SIGNED`→`F32` in `CpuGemmAssemblyDispatch` by alvoron · Pull Request #1297 · ARM-software/ComputeLibrary

alvoron · 2026-06-17T14:20:13Z

Two gaps in CpuGemmAssemblyDispatch blocked QASYMM8_SIGNED input from validating with an F32 output tensor, even though the underlying arm_gemm kernel GemmInterleaved<int8_t, int8_t, float, DequantizeFloat> already supports this combination on AArch64.

Gap 1 - has_opt_impl: The QASYMM8_SIGNED branch only tested for S32 and S8 outputs. Passing F32 fell through to the S8→S8 Requantize32 check, which fails because has_opt_gemm<int8_t, int8_t, int8_t, Requantize32> and has_opt_gemm<int8_t, int8_t, float, DequantizeFloat> are different instantiations.

Gap 2 - validate: There was no output-type guard for QASYMM8_SIGNED input at all. The equivalent guard for QASYMM8 explicitly allowed QASYMM8/S32/F32; QASYMM8_SIGNED had no such allowance, so F32 output reached downstream checks with no clear error.

Two gaps in the assembly dispatch layer prevented QASYMM8_SIGNED input from producing F32 output: 1. has_opt_impl() had no branch for F32 output when input is S8/ QASYMM8_SIGNED, causing spurious kernel-not-found errors. Add a DequantizeFloat branch mirroring the existing S32 branch. 2. validate() rejected F32 output for QASYMM8_SIGNED input because it had no explicit allowance for that combination. Add a guard that permits QASYMM8_SIGNED/S32/F32 as output types (matching the already- existing QASYMM8 guard). 3. AsmGemmInfo gains dequant_a_offset / dequant_b_offset fields so that callers can supply quantization zero-points to create_arm_gemm_dequant without touching existing callers. Also fix the __aarch64_ typo in the DequantFP32_SupportedTypes test guard so that the test now actually executes on AArch64 targets. Signed-off-by: Aleksandr Voron <aleksandr.voron@intel.com>

alvoron mentioned this pull request Jun 17, 2026

feat: optimized QASYMM8_SIGNED->F32 direct convolution path #1298

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable `QASYMM8_SIGNED`→`F32` in `CpuGemmAssemblyDispatch`#1297

feat: enable `QASYMM8_SIGNED`→`F32` in `CpuGemmAssemblyDispatch`#1297
alvoron wants to merge 1 commit into
ARM-software:mainfrom
alvoron:alvoron_qasymm8_signed_f32_dispatch

alvoron commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alvoron commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant