AutoMin2023

Datasets

Minuting datasets

ELITR-minuting-corpus (small)
Additional data generated by ChatGPT from transcripts?

Relevant summarization datasets

HuggingFace: https://huggingface.co/datasets?task_categories=task_categories:summarization
CNN-Daily Mail
- on HuggingFace
- CNN articles and summaries
AMI Meeting Corpus
- on HuggingFace
- product meetings transcripts and summaries
ICSI Meeting Corpus
- not on HuggingFace
- academic meetings transcripts and summaries
- code to process: https://github.com/xcfcode/meeting_summarization_dataset
Spotify Podcast Dataset
- not on HuggingFace
SAMSum Corpus
- on HuggingFace
- messenger-like conversations with summaries
DialogSum
- on HuggingFace
- dialogues and summaries
XSum Dataset
- on HuggingFace
- news articles and summaries
MediaSum
- on HuggingFace
- media interview transcripts and summaries
OAGK/OAGKX
- on Lindat
- scientific articles with abstracts
QMSum
- not on HuggingFace
- meeting transcripts, query-based summarizations of various topics discussed
Info:
- AMI and ICSI best but rather small

ELITR Minuting Corpus

Minutes

Original
- Taken real time, some content of the meeting might be missing
Generated
- Taken later by independent annotator not present in the meeting
Both can be used, both have some problems

Alignments

They are aligned manually
Can also be used for training, extract smaller pieces of transcripts and minutes that are properly aligned
We can talk to Marie Hledikova, email or Wednesday at 10:00

Models

Pretrained summarization models

https://huggingface.co/models?pipeline_tag=summarization&sort=downloads
Relevant:
DialogLM (model for long dialogue understanding) https://github.com/microsoft/DialogLM

Implementations

Minuting Baseline Experiments

Many work done, but not documented properly
Some documentation: https://elitr.eu/deliverables/

Additional Information related to the topic

Computing

Running jupyter notebook on UFAL servers

TODO

Try some HuggingFace summarization pretrained models and finetune it on the minuting dataset
Download and analyze the various datasets
Preprocess the Europarl dataset

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
datasets		datasets
experiments		experiments
implementations		implementations
predictions/DialogLED predictions test		predictions/DialogLED predictions test
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AutoMin2023

Datasets

Minuting datasets

Relevant summarization datasets

ELITR Minuting Corpus

Minutes

Alignments

Models

Pretrained summarization models

Implementations

Minuting Baseline Experiments

Additional Information related to the topic

Computing

Running jupyter notebook on UFAL servers

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Languages

MichelleElizabethK/AutoMin2023

Folders and files

Latest commit

History

Repository files navigation

AutoMin2023

Datasets

Minuting datasets

Relevant summarization datasets

ELITR Minuting Corpus

Minutes

Alignments

Models

Pretrained summarization models

Implementations

Minuting Baseline Experiments

Additional Information related to the topic

Computing

Running jupyter notebook on UFAL servers

TODO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages