forked from TheTom/llama-cpp-turboquant
-
Notifications
You must be signed in to change notification settings - Fork 17
Pull requests: AtomicBot-ai/atomic-llama-cpp-turboquant
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix: add missing prototype for turbo_cpu_fwht_inverse to resolve -Wmissing-prototypes CI error
ggml
#12
opened May 13, 2026 by
sujitvasanth
Loading…
feat: one-sided target probability acceptance for MTP drafts increases acceptance rate and throughput compared to argmax alone
examples
server
#8
opened May 11, 2026 by
sujitvasanth
Loading…
Enhance CUDA flash attention kernel selection for DKQ=512 with low gq…
ggml
Nvidia GPU
#6
opened May 8, 2026 by
Ooooze
Loading…
Repro: MTP path on CUDA aborts at fattn.cu:109 (DKQ=512) for Gemma 4 — Blackwell sm_120 + Ampere sm_86
documentation
Improvements or additions to documentation
#5
opened May 8, 2026 by
jameseiten
•
Draft
ProTip!
Add no:assignee to see everything that’s not assigned.