The quantization support I've added through --low-prec-bytes-per-val is a bit barebones. It'd be nice to add enough flexibility to handle per-block quantization (e.g. some only quantize the linears to int4) and some of the new formats that aren't a multiple of a byte (e.g. int4, fp6, etc)
Relevant: #36
The quantization support I've added through
--low-prec-bytes-per-valis a bit barebones. It'd be nice to add enough flexibility to handle per-block quantization (e.g. some only quantize the linears to int4) and some of the new formats that aren't a multiple of a byte (e.g. int4, fp6, etc)Relevant: #36