We should benchmark agaisnt https://github.com/huggingface/tokenizers I don't expect for us to win, but it gives us a line to target against.