Skip to content

Supports vnni-256 for GPTQ INT4#1926

Merged
yyj6666667 merged 4 commits intokvcache-ai:mainfrom
callmegaga:main
Apr 13, 2026
Merged

Supports vnni-256 for GPTQ INT4#1926
yyj6666667 merged 4 commits intokvcache-ai:mainfrom
callmegaga:main

Conversation

@callmegaga
Copy link
Copy Markdown
Contributor

What does this PR do?

Add avx-vnni-256 backend for GPTQ INT4.There is a significant improvement in the prefill.The accuracy loss is around 0.013, which is larger than AVX2's 0.003.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

@ErvinXie ErvinXie requested review from mrhaoxx and ouqingliang and removed request for ouqingliang April 10, 2026 09:30
@yyj6666667
Copy link
Copy Markdown
Collaborator

here is a prefill speed comparison:
with vnni:
image
without vnni:
image

"PTok" represents the number of prompt tokens processed during the prefill stage. It offers a stable ~5x acceleration in prefill stage, especially as the prompt length increases.

@yyj6666667 yyj6666667 merged commit a9411f1 into kvcache-ai:main Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants