Discrepancy in GSM8K evaluation results compared to Table 1

Hi,

I git cloned your project and ran the `../Fast-dLLM/llada/eval_gsm8k.sh` script, but I couldn't reproduce the results reported in the paper (Table 1).

Specifically, the accuracy I obtained is about 2% lower than the reported data. 

I have double-checked the script multiple times, and the default parameters appear to align exactly with the experiment in Table 1. Could this discrepancy be related to hardware differences (e.g., GPU model), or is there any other specific setting I might have missed?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy in GSM8K evaluation results compared to Table 1 #63

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discrepancy in GSM8K evaluation results compared to Table 1 #63

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions