Supports vnni-256 for GPTQ INT4 by callmegaga · Pull Request #1926 · kvcache-ai/ktransformers

callmegaga · 2026-04-09T14:09:46Z

What does this PR do?

Add avx-vnni-256 backend for GPTQ INT4.There is a significant improvement in the prefill.The accuracy loss is around 0.013, which is larger than AVX2's 0.003.

gemini-code-assist · 2026-04-09T14:15:02Z

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

yyj6666667 · 2026-04-13T09:58:42Z

here is a prefill speed comparison:
with vnni:

without vnni:

"PTok" represents the number of prompt tokens processed during the prefill stage. It offers a stable ~5x acceleration in prefill stage, especially as the prompt length increases.

callmegaga added 4 commits April 5, 2026 14:27

[feat](kt-kernel): support avx-vnni-256 for gptq int4

4606bf1

Merge branch 'kvcache-ai:main' into main

2877b3a

Merge branch 'kvcache-ai:main' into main

8d9b6ba

[refactor](kt-kernel): Optimize the issues raised in the review

44d9df9

ErvinXie requested review from mrhaoxx and ouqingliang and removed request for ouqingliang April 10, 2026 09:30

yyj6666667 merged commit a9411f1 into kvcache-ai:main Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supports vnni-256 for GPTQ INT4#1926

Supports vnni-256 for GPTQ INT4#1926
yyj6666667 merged 4 commits intokvcache-ai:mainfrom
callmegaga:main

callmegaga commented Apr 9, 2026

Uh oh!

gemini-code-assist bot commented Apr 9, 2026

Uh oh!

yyj6666667 commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

callmegaga commented Apr 9, 2026

What does this PR do?

Uh oh!

gemini-code-assist bot commented Apr 9, 2026

Uh oh!

yyj6666667 commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants