A question about Persona Chat datatset.

Hello, I have downloaded the Persona Chat dataset in the glge website and used the easy version (full data). When testing my own BART-base code following basic seq2seq settings, the BLEU result of Persona Chat is far below the one of BART-large provided in your paper.
BLEU-1: 12.79, BLEU-2: 7.81

While I use the same code to test dataset GigaWord, the result is expected.
ROUGE-1: 37.98, ROUGE-2: 19.18

So I am confused about the evaluation of Persona Chat, and I get that this is a multi-turn dialog dataset. Hence, I want to know if there is any difference in the training or evaluation settings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about Persona Chat datatset. #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A question about Persona Chat datatset. #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions