RAMP-Quant

RAMP: RL-guided Adaptive Mixed-Precision quantization for GGUF models. Produces hardware-optimized quantized models for consumer GPUs.

RAMP uses data-free sensitivity analysis (no calibration data needed), evolutionary search, and per-tensor type optimization to find the best mixed-precision configuration for your specific hardware and VRAM budget.

Key result

RAMP v2 produced a 15.2 GB Qwen3.5-35B-A3B GGUF (3.78 BPW) that runs at 90 tok/s on RTX 5060 Ti 16GB with 30/30 on functional benchmarks. Custom imatrix + per-tensor overrides (IQ3_S base + Q8_0/Q6_K/Q5_K for SSM/attention critical paths).

How it works

RAMP runs a 6-phase pipeline:

GGUF Analysis — parse tensor structure, identify layer types (GDN, attention, MoE experts, norms)
NSDS Sensitivity — data-free sensitivity scoring per tensor (no calibration data required)
Proxy Model — build quantization error database, estimate proxy loss for each configuration
Search — greedy + evolutionary search over mixed-precision configurations within VRAM budget
Validate — verify the configuration fits hardware constraints
Build — generate llama-quantize command with --custom-q tensor overrides

What's in this repo

ramp_local.py           Main orchestrator (6-phase pipeline)
gguf_analyzer.py        GGUF structure analysis
nsds_sensitivity.py     Data-free per-tensor sensitivity (NSDS)
proxy_model.py          Quantization error DB + proxy loss
gguf_builder.py         GGUF construction from config
search_evo.py           Evolutionary + greedy search
quant_error.py          Quantization error estimation
validate.py             Result validation
ramp_quantize.c         C implementation for custom quantization

pipeline/               Extended pipeline modules
  run_pipeline.py       Full pipeline with monitoring
  allocator.py          Bit allocation logic
  sensitivity_analyzer.py  Layer sensitivity analysis
  benchmark.py          Performance/quality benchmark
  optrot_selective.py   Selective optimal rotation
  monitor.py            Pipeline monitoring

tools/                  Quantization utilities
  benchmark_gguf.py     Perplexity + functional domain tests
  kurtboost_bf16.py     Kurtosis-based per-tensor analysis
  build_mtp_gguf.py     Build GGUF with MTP tensors
  build_mtp_gguf_v3.py  Build GGUF with MTP tensors (v3, shard injection)
  inject_mtp_tensors.py Inject MTP tensors into existing GGUF
  patch_gguf_mtp.py     Patch GGUF for MTP support

Usage

# Run full RAMP pipeline
python ramp_local.py \
  --model path/to/model-BF16.gguf \
  --target-size 15G \
  --gpu-vram 16G

# KurtBoost sensitivity analysis
python tools/kurtboost_bf16.py path/to/model-BF16/ --output overrides.json

# Benchmark a GGUF model
python tools/benchmark_gguf.py path/to/model.gguf --test-functional --test-ppl

The RAMP v2 model

The optimized Qwen3.5-35B-A3B-RAMP-v2-15G GGUF is available on HuggingFace: Kevletesteur/Qwen3.5-35B-A3B-RAMP-v2-15G.

Related repos

chimere — Rust inference runtime that uses RAMP-quantized models
chimere-odo — Inference orchestrator

License

Apache 2.0 — see LICENSE.

Author

Kevin Remondiere — Independent ML researcher, Bayonne, France

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAMP-Quant

Key result

How it works

What's in this repo

Usage

The RAMP v2 model

Related repos

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
pipeline		pipeline
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gguf_analyzer.py		gguf_analyzer.py
gguf_builder.py		gguf_builder.py
nsds_sensitivity.py		nsds_sensitivity.py
proxy_model.py		proxy_model.py
quant_error.py		quant_error.py
ramp_local.py		ramp_local.py
ramp_quantize.c		ramp_quantize.c
search_evo.py		search_evo.py
validate.py		validate.py

Folders and files

Latest commit

History

Repository files navigation

RAMP-Quant

Key result

How it works

What's in this repo

Usage

The RAMP v2 model

Related repos

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages