Skip to content

Question about Llama-3.1 8B perplexity issue #7

@quaternior

Description

@quaternior

Hello,

First of all, thank you for your incredible work and dedication to this project!

I have a question regarding the perplexity of the Llama-3.1 8B model validated on WikiText2. Specifically, the reported perplexity for Llama-3.1 8B is 5.8883, which appears to be higher than that of Llama-2 7B. This result seems unusual, and I haven't been able to find additional results or discussions on this topic.

Additionally, while running validation on Llama-3.1 8B using the framework, I noticed the following log messages:

'''
Token indices sequence length is longer than the specified maximum sequence length for this model (289076 > 131072). Running this sequence through the model will result in indexing errors
'''

This raises a couple of questions:

  1. Could this issue with the input sequence length be related to the perplexity discrepancy?
  2. Is this behavior possibly due to a version mismatch in the transformers or tokenizers library?
    I plan to validate other models, such as Llama-3 and Llama-3.2 of similar parameter sizes, to compare results. However, I would appreciate it if you could provide insights or clarification on these points in the meantime.
  3. Also, is it related with the result(from section 1 in the README.md) which shows the 2:4 pruning of Llama-2 7b is better than Llama-3 or Llama-3.1 8B?

Thank you so much for your time and help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions