good work!
I'm using 10M audio codecs to train codec-bpe with a vocab_size=30k,num_codebook=4,codebook_size=1024, after training I get a tokenizer and found the compression rate is nearly 1. For example, a 4s audio with 25hz, I get 400 audio codec tokens, and using codec-bpe I get about nearly the same token-size about more than 350 which is less then 1 ,this can yield savings of 2-5x in sequence length compared to directly modeling the flattened codebooks . I follow the steps in the readme, so can you share your codec-bpe tokenizer based on encodec or some other codec?
good work!
I'm using 10M audio codecs to train codec-bpe with a
vocab_size=30k,num_codebook=4,codebook_size=1024, after training I get a tokenizer and found the compression rate is nearly 1. For example, a 4s audio with 25hz, I get 400 audio codec tokens, and using codec-bpe I get about nearly the same token-size about more than 350 which is less then 1 ,this can yield savings of 2-5x in sequence length compared to directly modeling the flattened codebooks. I follow the steps in the readme, so can you share your codec-bpe tokenizer based on encodec or some other codec?