In retnet-3b/config.json, according to the experimental settings of the paper
https://arxiv.org/pdf/2307.08621.pdf , set decoder_ffn_embed_dim and decoder_value_embed_dim to twice the size of decoder_embed_dim. With the parameters in nn.Embedding, the model size is 3.2B, not 2.7B. If the number of parameters of nn.Embedding is subtracted, the number of parameters is about 3B, which does not match the 2.7B mentioned in the paper.