kv-cache-compression

First open-source implementation of Google TurboQuant (ICLR 2026) -- near-optimal KV cache compression for LLM inference. 5x compression with near-zero quality loss.

machine-learning compression deep-learning pytorch transformer attention quantization iclr vector-quantization memory-optimization kv-cache google-research llm vllm llm-inference kv-cache-compression

Updated Apr 17, 2026
Python

abdelfattah-lab / xKV

Star

xKV: Cross-Layer SVD for KV-Cache Compression

mla low-rank long-context llm-inference deepseek kv-cache-compression inter-layer

Updated Nov 30, 2025
Python

Linking-ai / SCOPE

Star

(ACL2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation

long-context kv-cache-compression kvcache

Updated May 28, 2025
Jupyter Notebook

Janghyun1230 / FastKVzip

Star

Accurate and fast KV cache compression with a gating mechanism

large-language-models kv-cache-compression

Updated Apr 5, 2026
Python

OnlyTerp / kvtc

Star

First open-source KVTC implementation (NVIDIA, ICLR 2026) -- 8-32x KV cache compression via PCA + adaptive quantization + entropy coding

compression pytorch nvidia transformer pca attention dynamic-programming quantization deflate entropy-coding memory-optimization kv-cache llm llm-inference kv-cache-compression iclr-2026

Updated Apr 17, 2026
Python

MGDDestiny / Lava

Star

LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation

llm kv-cache-compression

Updated Sep 17, 2025
Python

rookiemann / vllm-windows-build

Star

Native Windows build of vLLM 0.19.0 — no WSL, no Docker. Pre-built wheels + 33-file Windows patch + Multi-TurboQuant KV cache compression (6 methods, 2x cache capacity). PyTorch 2.10 + CUDA 12.6 + Triton + Flash-Attention 2.

Updated Apr 12, 2026
Python

Ryuketsukami / turboquant-skill

Star

AI agent skill implementing Google's TurboQuant compression algorithm (ICLR 2026) — 6x KV cache memory reduction, 8x speedup, zero accuracy loss. Compatible with Claude Code, Codex CLI, and all Agent Skills-compatible tools.

Updated Mar 28, 2026
Python

Ryuketsukami / turboquant-compression

Star

Near-optimal vector quantization for LLM KV cache compression. Python implementation of TurboQuant (ICLR 2026) — PolarQuant + QJL for 3-bit quantization with minimal accuracy loss and up to 8x memory reduction.

Updated Mar 28, 2026
Python

alexluchen / pitfalls-of-kv-cache-compression

Star

Repository for the paper: https://arxiv.org/abs/2510.00231

machine-learning artificial-intelligence kv-cache-compression

Updated Oct 6, 2025
Python

Smilefounder / TurboMLX

Star

Drop-in KV cache compression for MLX on Apple Silicon. Brings PolarQuant (Google, ICLR 2026) to mlx-lm with first-class Gemma 4 support: MatFormer, dual head_dim, hybrid sliding/global attention, cross-layer KV sharing. 3-bit → 4.8× smaller cache, 0.995 logit cosine @ 4-bit.

memory-efficient gemma mlx on-device-ai kv-cache apple-silicon llm kv-cache-compression mlx-lm turboquant polarquant gemma-4

Updated Apr 16, 2026
Python

Aaryafalle / turboquant_demo

Star

Test Google's new TurboQuant KV-cache compression (ICLR 2026) on your local machine — measure real speed, memory, and accuracy differences across compression modes.

google model llamacpp kv-cache-compression turboquant

Updated Apr 7, 2026
Python

Improve this page

Add a description, image, and links to the kv-cache-compression topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the kv-cache-compression topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv-cache-compression

Here are 26 public repositories matching this topic...

Zefan-Cai / KVCache-Factory

NVIDIA / kvpress

Zefan-Cai / Awesome-LLM-KV-Cache

snu-mllab / KVzip

itsnamgyu / block-transformer

shadowpa0327 / Palu

snu-mllab / Context-Memory

JIA-Lab-research / Q-LLM

OnlyTerp / turboquant

abdelfattah-lab / xKV

Linking-ai / SCOPE

Janghyun1230 / FastKVzip

OnlyTerp / kvtc

MGDDestiny / Lava

rookiemann / vllm-windows-build

Ryuketsukami / turboquant-skill

Ryuketsukami / turboquant-compression

alexluchen / pitfalls-of-kv-cache-compression

Smilefounder / TurboMLX

Aaryafalle / turboquant_demo

Improve this page

Add this topic to your repo