Name and Version
(This is a cherrypick of the triattention branch onto a more recent version of feature/turboquant-kv-cache)
$ llama-cli --version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 11911 MiB):
Device 0: NVIDIA RTX 4000 Ada Generation Laptop GPU, compute capability 8.9, VMM: yes, VRAM: 11911 MiB
version: 9078 (4e20d9f84)
built with GNU 11.5.0 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
NVIDIA RTX 4000 Ada
Models
Qwen3.6-35B-A3B-Q4_K_M
Problem description & steps to reproduce
On the head version of triattention there is reference to a flag --triattention-calibrate, neither option exists. In the documentation there is reference to scripts/calibrate-triattention.py and scripts/validate-calibration.py but these are also missing.
First Bad Commit
No response
Relevant log output
Logs
Name and Version
(This is a cherrypick of the triattention branch onto a more recent version of feature/turboquant-kv-cache)
$ llama-cli --version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 11911 MiB):
Device 0: NVIDIA RTX 4000 Ada Generation Laptop GPU, compute capability 8.9, VMM: yes, VRAM: 11911 MiB
version: 9078 (4e20d9f84)
built with GNU 11.5.0 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
NVIDIA RTX 4000 Ada
Models
Qwen3.6-35B-A3B-Q4_K_M
Problem description & steps to reproduce
On the head version of triattention there is reference to a flag --triattention-calibrate, neither option exists. In the documentation there is reference to scripts/calibrate-triattention.py and scripts/validate-calibration.py but these are also missing.
First Bad Commit
No response
Relevant log output
Logs