Hello,
First of all, thank you for your incredible work and dedication to this project!
I have a question regarding the perplexity of the Llama-3.1 8B model validated on WikiText2. Specifically, the reported perplexity for Llama-3.1 8B is 5.8883, which appears to be higher than that of Llama-2 7B. This result seems unusual, and I haven't been able to find additional results or discussions on this topic.
Additionally, while running validation on Llama-3.1 8B using the framework, I noticed the following log messages:
'''
Token indices sequence length is longer than the specified maximum sequence length for this model (289076 > 131072). Running this sequence through the model will result in indexing errors
'''
This raises a couple of questions:
- Could this issue with the input sequence length be related to the perplexity discrepancy?
- Is this behavior possibly due to a version mismatch in the transformers or tokenizers library?
I plan to validate other models, such as Llama-3 and Llama-3.2 of similar parameter sizes, to compare results. However, I would appreciate it if you could provide insights or clarification on these points in the meantime.
- Also, is it related with the result(from section 1 in the README.md) which shows the 2:4 pruning of Llama-2 7b is better than Llama-3 or Llama-3.1 8B?
Thank you so much for your time and help!
Hello,
First of all, thank you for your incredible work and dedication to this project!
I have a question regarding the perplexity of the Llama-3.1 8B model validated on WikiText2. Specifically, the reported perplexity for Llama-3.1 8B is 5.8883, which appears to be higher than that of Llama-2 7B. This result seems unusual, and I haven't been able to find additional results or discussions on this topic.
Additionally, while running validation on Llama-3.1 8B using the framework, I noticed the following log messages:
'''
Token indices sequence length is longer than the specified maximum sequence length for this model (289076 > 131072). Running this sequence through the model will result in indexing errors
'''
This raises a couple of questions:
I plan to validate other models, such as Llama-3 and Llama-3.2 of similar parameter sizes, to compare results. However, I would appreciate it if you could provide insights or clarification on these points in the meantime.
Thank you so much for your time and help!