Question about Llama-3.1 8B perplexity issue

Hello,

First of all, thank you for your incredible work and dedication to this project!

I have a question regarding the perplexity of the Llama-3.1 8B model validated on WikiText2. Specifically, the reported perplexity for Llama-3.1 8B is 5.8883, which appears to be higher than that of Llama-2 7B. This result seems unusual, and I haven't been able to find additional results or discussions on this topic.

Additionally, while running validation on Llama-3.1 8B using the framework, I noticed the following log messages:

'''
Token indices sequence length is longer than the specified maximum sequence length for this model (289076 > 131072). Running this sequence through the model will result in indexing errors
'''

This raises a couple of questions:

1. Could this issue with the input sequence length be related to the perplexity discrepancy?
2. Is this behavior possibly due to a version mismatch in the transformers or tokenizers library?
I plan to validate other models, such as Llama-3 and Llama-3.2 of similar parameter sizes, to compare results. However, I would appreciate it if you could provide insights or clarification on these points in the meantime.
3. Also, is it related with the result(from section 1 in the README.md) which shows the 2:4 pruning of Llama-2 7b is better than Llama-3 or Llama-3.1 8B?

Thank you so much for your time and help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Llama-3.1 8B perplexity issue #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about Llama-3.1 8B perplexity issue #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions