feat: enable QASYMM8_SIGNED→F32 in CpuGemmAssemblyDispatch#1297
Open
alvoron wants to merge 1 commit into
Open
Conversation
Two gaps in the assembly dispatch layer prevented QASYMM8_SIGNED input from producing F32 output: 1. has_opt_impl() had no branch for F32 output when input is S8/ QASYMM8_SIGNED, causing spurious kernel-not-found errors. Add a DequantizeFloat branch mirroring the existing S32 branch. 2. validate() rejected F32 output for QASYMM8_SIGNED input because it had no explicit allowance for that combination. Add a guard that permits QASYMM8_SIGNED/S32/F32 as output types (matching the already- existing QASYMM8 guard). 3. AsmGemmInfo gains dequant_a_offset / dequant_b_offset fields so that callers can supply quantization zero-points to create_arm_gemm_dequant without touching existing callers. Also fix the __aarch64_ typo in the DequantFP32_SupportedTypes test guard so that the test now actually executes on AArch64 targets. Signed-off-by: Aleksandr Voron <aleksandr.voron@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two gaps in
CpuGemmAssemblyDispatchblockedQASYMM8_SIGNEDinput from validating with anF32output tensor, even though the underlyingarm_gemmkernelGemmInterleaved<int8_t, int8_t, float, DequantizeFloat>already supports this combination on AArch64.Gap 1 -
has_opt_impl: TheQASYMM8_SIGNEDbranch only tested forS32andS8outputs. PassingF32fell through to theS8→S8Requantize32check, which fails becausehas_opt_gemm<int8_t, int8_t, int8_t, Requantize32>andhas_opt_gemm<int8_t, int8_t, float, DequantizeFloat>are different instantiations.Gap 2 -
validate: There was no output-type guard forQASYMM8_SIGNEDinput at all. The equivalent guard forQASYMM8explicitly allowedQASYMM8/S32/F32;QASYMM8_SIGNEDhad no such allowance, soF32output reached downstream checks with no clear error.