-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Hello, I have downloaded the Persona Chat dataset in the glge website and used the easy version (full data). When testing my own BART-base code following basic seq2seq settings, the BLEU result of Persona Chat is far below the one of BART-large provided in your paper.
BLEU-1: 12.79, BLEU-2: 7.81
While I use the same code to test dataset GigaWord, the result is expected.
ROUGE-1: 37.98, ROUGE-2: 19.18
So I am confused about the evaluation of Persona Chat, and I get that this is a multi-turn dialog dataset. Hence, I want to know if there is any difference in the training or evaluation settings.
Metadata
Metadata
Assignees
Labels
No labels