Skip to content

Adding MediaSum dataset#305

Merged
seanzhangkx8 merged 4 commits intoCornellNLP:masterfrom
AnnaWegmann:patch-1
Feb 3, 2026
Merged

Adding MediaSum dataset#305
seanzhangkx8 merged 4 commits intoCornellNLP:masterfrom
AnnaWegmann:patch-1

Conversation

@AnnaWegmann
Copy link
Copy Markdown
Contributor

Description

This adds the mediasum.rst file for documentation and the convert_mediasum-corpus.ipnyb for the script that was used to convert the mediasum dataset to a convokit corpus object. Find the zipped dataset here: https://drive.google.com/file/d/1cCaSuVUKN0B3s-GxnWg1gtWwOLNF66n0/view?usp=sharing to be added to your servers

Motivation and Context

add a dataset, see details in the .rst file, but this is based on
https://aclanthology.org/2021.naacl-main.474.pdf and https://aclanthology.org/2024.emnlp-main.52/

How has this been tested?

see convert_mediasum-corpus.ipnyb for the creation / testing outputs

Other information

corpus still needs to be added to your servers https://drive.google.com/file/d/1cCaSuVUKN0B3s-GxnWg1gtWwOLNF66n0/view?usp=sharing

@cristiandnm cristiandnm added the dataset Use this tag when providing a new dataset for inclusion in ConvoKit. label Sep 11, 2025
@seanzhangkx8
Copy link
Copy Markdown
Collaborator

Hi Anna, thank you so much for your contribution to ConvoKit. It looks great. I will just add some configuration to support downloading the corpus from ConvoKit directly. After that I will merge the PR into our main branch. Thanks again for your work!

@seanzhangkx8 seanzhangkx8 merged commit b243967 into CornellNLP:master Feb 3, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset Use this tag when providing a new dataset for inclusion in ConvoKit.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants