Skip to content

Commit cae85a2

Browse files
author
The TensorFlow Datasets Authors
committed
Fix download links the multi_news dataset
PiperOrigin-RevId: 796933010
1 parent cab6201 commit cae85a2

File tree

4 files changed

+28
-16
lines changed

4 files changed

+28
-16
lines changed
Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,13 @@
1-
Multi-News, consists of news articles and human-written summaries
2-
of these articles from the site newser.com.
3-
Each summary is professionally written by editors and
4-
includes links to the original articles cited.
5-
6-
There are two features:
7-
- document: text of news articles seperated by special token "|||||".
8-
- summary: news summary.
1+
# Multi-News Dataset
2+
3+
Multi-News consists of news articles and human-written summaries of these
4+
articles from the news site `newser.com`. Each summary is professionally written
5+
by editors and includes links to the original articles cited.
6+
7+
This is the first large-scale dataset for multi-document summarization on news
8+
articles.
9+
10+
Each record has two features:
11+
12+
* `document`: Texts of news articles, separated by special token "|||||".
13+
* `summary`: Summary of the news.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
content.data-type.text # Contains text data.
2+
content.subject.news # Relates to news.
3+
content.language.en # Contains text in language English / en.
4+
ml.task.abstractive-text-summarization # Relates to Abstractive Text Summarization, a machine learning task.
5+
ml.task.natural-language-understanding # Relates to Natural Language Understanding, a machine learning task.
6+
ml.task.text-summarization # Relates to Text Summarization, a machine learning task.
Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
https://huggingface.co/datasets/alexfabbri/multi_news/raw/main/data/test.src.cleaned 133 d04c4581d52321a30c246d2caa72853ee7f28c6b7a3985ee436f54c4bc264315 test.src.cleaned
2-
https://huggingface.co/datasets/alexfabbri/multi_news/raw/main/data/test.tgt 132 afba4aa26d95bb557c0eaa0cb8f7495af2104f1e43f4b5f9ef429b8752477abd test.tgt
3-
https://huggingface.co/datasets/alexfabbri/multi_news/raw/main/data/train.src.cleaned 134 75f87b786ff1982bf1bd5803c6a7377d1834b81956ac680a6955789ba047cc0b train.src.cleaned
4-
https://huggingface.co/datasets/alexfabbri/multi_news/raw/main/data/train.tgt 133 9f1e9b290a6aae1aa67bd5b361c934ee9db32486e5cd97d83184c097ef8b27e5 train.tgt
5-
https://huggingface.co/datasets/alexfabbri/multi_news/raw/main/data/val.src.cleaned 133 8df3ef6bd1882094de8120fa635c3abf758e10427f81f306aaa4786df7b57861 val.src.cleaned
6-
https://huggingface.co/datasets/alexfabbri/multi_news/raw/main/data/val.tgt 132 9c0377a443ea92b17449f7df17f1cdfa7c7ebbfe3a45f2f8cd7b3e0ffb47b1df val.tgt
1+
https://huggingface.co/datasets/alexfabbri/multi_news/resolve/main/data/test.src.cleaned 68999509 138d3ac2dc899cbcd2e3745aaa94d1c1db55fb7058d9df4ba3ef2dac05a3a186 test.src.cleaned
2+
https://huggingface.co/datasets/alexfabbri/multi_news/resolve/main/data/test.tgt 7309099 fa97cf91a62ae82a0af6da88f2ddf8e06eb4e3b90f7971d8e0c516436518fae3 test.tgt
3+
https://huggingface.co/datasets/alexfabbri/multi_news/resolve/main/data/train.src.cleaned 547512283 627781c8ce55d528fcdacd495db45583a915e2d24b7983b0a5a6693ede933bb1 train.src.cleaned
4+
https://huggingface.co/datasets/alexfabbri/multi_news/resolve/main/data/train.tgt 58793912 e9e82b8f413b0f1ed4eb7c883f93bb744f829c218c1608b6ba7615d687d07121 train.tgt
5+
https://huggingface.co/datasets/alexfabbri/multi_news/resolve/main/data/val.src.cleaned 66875522 f0a43902da366eea2b882e39ddd4c0975ad44aba6b61095a2ea90362e9e2bb65 val.src.cleaned
6+
https://huggingface.co/datasets/alexfabbri/multi_news/resolve/main/data/val.tgt 7295302 bb08a078e0cb2b8ca9cc0fe3bfbe9d4098dee706bd00eb97449155e41b880157 val.tgt

tensorflow_datasets/datasets/multi_news/multi_news_dataset_builder.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
import tensorflow_datasets.public_api as tfds
2020

2121
_URL_PATH = (
22-
"https://huggingface.co/datasets/alexfabbri/multi_news/raw/main/data/"
22+
"https://huggingface.co/datasets/alexfabbri/multi_news/resolve/main/data/"
2323
)
2424
_LICENSE = "For non-commercial research and educational purposes only"
2525

@@ -31,10 +31,11 @@
3131
class Builder(tfds.core.GeneratorBasedBuilder):
3232
"""DatasetBuilder for multi_news dataset."""
3333

34-
VERSION = tfds.core.Version("2.0.0")
34+
VERSION = tfds.core.Version("2.1.0")
3535
RELEASE_NOTES = {
3636
"1.0.0": "Initial release.",
3737
"2.0.0": "Update the dataset with valid URLs.",
38+
"2.1.0": "Update the dataset with cleaned URLs.",
3839
}
3940

4041
def _info(self) -> tfds.core.DatasetInfo:

0 commit comments

Comments
 (0)