Improve Quantization

The quantization support I've added through `--low-prec-bytes-per-val` is a bit barebones. It'd be nice to add enough flexibility to handle per-block quantization (e.g. some only quantize the linears to int4) and some of the new formats that aren't a multiple of a byte (e.g. int4, fp6, etc)

Relevant: https://github.com/EleutherAI/cookbook/issues/36