Conversation
What improved
Bad variants
|
|
Threadgroup codebook in the arg could hurt performance on non-codebook quants, need to benchmarks on different machine for it, if it is true, need to separate to two kernels maybe |
|
previous attempts in #394 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 69f4ef5c8d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| QuantizationMethod::Codebook => { | ||
| let codebook_value = | ||
| (*codebook.expect("Codebook quantized QMV requires a codebook").add(val_q)).to_f32(); | ||
| scale * codebook_value |
There was a problem hiding this comment.
Reject non-4-bit codebook QMV combinations
QuantizedMatmulQmvFast now accepts quant_method == Codebook for both BITS=4 and BITS=8, but the CPU path indexes the codebook with the unpacked quantized value (val_q) without any bounds check. With BITS=8, val_q can be 0..255, so callers that provide the expected 16-entry NF4 codebook will trigger out-of-bounds reads in qmv and produce undefined behavior. Please fail fast for Codebook unless BITS == 4 (or validate/require a 256-entry codebook).
Useful? React with 👍 / 👎.
Summary