Skip to content

Commit d92e86e

Browse files
author
The TensorFlow Datasets Authors
committed
Automated documentation update.
PiperOrigin-RevId: 798159149
1 parent d81a4e3 commit d92e86e

File tree

3 files changed

+49
-14
lines changed

3 files changed

+49
-14
lines changed

docs/catalog/_toc.yaml

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ toc:
2222
title: billsum
2323
- path: /datasets/catalog/booksum
2424
title: booksum (manual)
25+
- path: /datasets/catalog/multi_news
26+
title: multi_news
2527
- path: /datasets/catalog/newsroom
2628
title: newsroom (manual)
2729
- path: /datasets/catalog/reddit
@@ -815,6 +817,8 @@ toc:
815817
title: math_qa
816818
- path: /datasets/catalog/mlqa
817819
title: mlqa
820+
- path: /datasets/catalog/multi_news
821+
title: multi_news
818822
- path: /datasets/catalog/natural_instructions
819823
title: natural_instructions
820824
- path: /datasets/catalog/natural_questions
@@ -862,6 +866,10 @@ toc:
862866
- path: /datasets/catalog/glove100_angular
863867
title: glove100_angular
864868
title: Nearest neighbors
869+
- section:
870+
- path: /datasets/catalog/multi_news
871+
title: multi_news
872+
title: News
865873
- section:
866874
- path: /datasets/catalog/coco
867875
title: coco
@@ -1304,8 +1312,6 @@ toc:
13041312
title: gigaword
13051313
- path: /datasets/catalog/gov_report
13061314
title: gov_report
1307-
- path: /datasets/catalog/multi_news
1308-
title: multi_news
13091315
- path: /datasets/catalog/wikihow
13101316
title: wikihow (manual)
13111317
- path: /datasets/catalog/xsum
@@ -1475,6 +1481,8 @@ toc:
14751481
title: movie_rationales
14761482
- path: /datasets/catalog/mrqa
14771483
title: mrqa
1484+
- path: /datasets/catalog/multi_news
1485+
title: multi_news
14781486
- path: /datasets/catalog/multi_nli
14791487
title: multi_nli
14801488
- path: /datasets/catalog/multi_nli_mismatch
@@ -1747,6 +1755,8 @@ toc:
17471755
title: booksum (manual)
17481756
- path: /datasets/catalog/databricks_dolly
17491757
title: databricks_dolly
1758+
- path: /datasets/catalog/multi_news
1759+
title: multi_news
17501760
- path: /datasets/catalog/newsroom
17511761
title: newsroom (manual)
17521762
- path: /datasets/catalog/reddit

docs/catalog/multi_news.md

Lines changed: 29 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
<meta itemprop="name" content="TensorFlow Datasets" />
44
</div>
55
<meta itemprop="name" content="multi_news" />
6-
<meta itemprop="description" content="Multi-News, consists of news articles and human-written summaries&#10;of these articles from the site newser.com.&#10;Each summary is professionally written by editors and&#10;includes links to the original articles cited.&#10;&#10;There are two features:&#10; - document: text of news articles seperated by special token &quot;|||||&quot;.&#10; - summary: news summary.&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load(&#x27;multi_news&#x27;, split=&#x27;train&#x27;)&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
6+
<meta itemprop="description" content="# Multi-News Dataset&#10;&#10;Multi-News consists of news articles and human-written summaries of these&#10;articles from the news site `newser.com`. Each summary is professionally written&#10;by editors and includes links to the original articles cited.&#10;&#10;This is the first large-scale dataset for multi-document summarization on news&#10;articles.&#10;&#10;Each record has two features:&#10;&#10;* `document`: Texts of news articles, separated by special token &quot;|||||&quot;.&#10;* `summary`: Summary of the news.&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load(&#x27;multi_news&#x27;, split=&#x27;train&#x27;)&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
77
<meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/multi_news" />
88
<meta itemprop="sameAs" content="https://github.com/Alex-Fabbri/Multi-News" />
99
<meta itemprop="citation" content="@misc{alex2019multinews,&#10; title={Multi-News: a Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model},&#10; author={Alexander R. Fabbri and Irene Li and Tianwei She and Suyi Li and Dragomir R. Radev},&#10; year={2019},&#10; eprint={1906.01749},&#10; archivePrefix={arXiv},&#10; primaryClass={cs.CL}&#10;}" />
@@ -12,14 +12,26 @@
1212
# `multi_news`
1313

1414

15+
Note: This dataset has been updated since the last stable release. The new
16+
versions and config marked with
17+
<span class="material-icons" title="Available only in the tfds-nightly package">nights_stay</span>
18+
are only available in the `tfds-nightly` package.
19+
1520
* **Description**:
1621

17-
Multi-News, consists of news articles and human-written summaries of these
18-
articles from the site newser.com. Each summary is professionally written by
19-
editors and includes links to the original articles cited.
22+
# Multi-News Dataset
23+
24+
Multi-News consists of news articles and human-written summaries of these
25+
articles from the news site `newser.com`. Each summary is professionally written
26+
by editors and includes links to the original articles cited.
27+
28+
This is the first large-scale dataset for multi-document summarization on news
29+
articles.
30+
31+
Each record has two features:
2032

21-
There are two features: - document: text of news articles seperated by special
22-
token "|||||". - summary: news summary.
33+
* `document`: Texts of news articles, separated by special token "|||||".
34+
* `summary`: Summary of the news.
2335

2436
* **Additional Documentation**:
2537
<a class="button button-with-icon" href="https://paperswithcode.com/dataset/multi-news">
@@ -31,15 +43,21 @@ token "|||||". - summary: news summary.
3143
[https://github.com/Alex-Fabbri/Multi-News](https://github.com/Alex-Fabbri/Multi-News)
3244

3345
* **Source code**:
34-
[`tfds.summarization.MultiNews`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/summarization/multi_news.py)
46+
[`tfds.datasets.multi_news.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/multi_news/multi_news_dataset_builder.py)
3547

3648
* **Versions**:
3749

38-
* **`1.0.0`** (default): No release notes.
50+
* `1.0.0`: Initial release.
51+
* `2.0.0`: [Do not use] Update the dataset with valid URLs.
52+
* **`2.1.0`** (default)
53+
<span class="material-icons" title="Available only in the tfds-nightly package">nights_stay</span>:
54+
Update the dataset with the correct URLs. The URLs in this version come
55+
from HuggingFace's dataset repo, which is curated by the same author:
56+
https://huggingface.co/datasets/alexfabbri/multi_news.
3957

40-
* **Download size**: `245.06 MiB`
58+
* **Download size**: `721.73 MiB`
4159

42-
* **Dataset size**: `669.80 MiB`
60+
* **Dataset size**: `666.50 MiB`
4361

4462
* **Auto-cached**
4563
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
@@ -88,7 +106,7 @@ summary | Text | | string |
88106
<button id="displaydataframe">Display examples...</button>
89107
<div id="dataframecontent" style="overflow-x:auto"></div>
90108
<script>
91-
const url = "https://storage.googleapis.com/tfds-data/visualization/dataframe/multi_news-1.0.0.html";
109+
const url = "https://storage.googleapis.com/tfds-data/visualization/dataframe/multi_news-2.1.0.html";
92110
const dataButton = document.getElementById('displaydataframe');
93111
dataButton.addEventListener('click', async () => {
94112
// Disable the button after clicking (dataframe loaded only once).

docs/catalog/overview.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ for ex in tfds.load('cifar10', split='train'):
4040
* [`aeslc`](aeslc.md)
4141
* [`billsum`](billsum.md)
4242
* [`booksum`](booksum.md)
43+
* [`multi_news`](multi_news.md)
4344
* [`newsroom`](newsroom.md)
4445
* [`reddit`](reddit.md)
4546
* [`reddit_tifu`](reddit_tifu.md)
@@ -515,6 +516,7 @@ for ex in tfds.load('cifar10', split='train'):
515516
* [`math_dataset`](math_dataset.md)
516517
* [`math_qa`](math_qa.md)
517518
* [`mlqa`](mlqa.md)
519+
* [`multi_news`](multi_news.md)
518520
* [`natural_instructions`](natural_instructions.md)
519521
* [`natural_questions`](natural_questions.md)
520522
* [`natural_questions_open`](natural_questions_open.md)
@@ -541,6 +543,10 @@ for ex in tfds.load('cifar10', split='train'):
541543
* [`deep1b`](deep1b.md)
542544
* [`glove100_angular`](glove100_angular.md)
543545

546+
### `News`
547+
548+
* [`multi_news`](multi_news.md)
549+
544550
### `Object detection`
545551

546552
* [`coco`](coco.md)
@@ -808,7 +814,6 @@ for ex in tfds.load('cifar10', split='train'):
808814
* [`covid19sum`](covid19sum.md)
809815
* [`gigaword`](gigaword.md)
810816
* [`gov_report`](gov_report.md)
811-
* [`multi_news`](multi_news.md)
812817
* [`wikihow`](wikihow.md)
813818
* [`xsum`](xsum.md)
814819

@@ -900,6 +905,7 @@ for ex in tfds.load('cifar10', split='train'):
900905
* [`mlqa`](mlqa.md)
901906
* [`movie_rationales`](movie_rationales.md)
902907
* [`mrqa`](mrqa.md)
908+
* [`multi_news`](multi_news.md)
903909
* [`multi_nli`](multi_nli.md)
904910
* [`multi_nli_mismatch`](multi_nli_mismatch.md)
905911
* [`natural_instructions`](natural_instructions.md)
@@ -1046,6 +1052,7 @@ for ex in tfds.load('cifar10', split='train'):
10461052
* [`billsum`](billsum.md)
10471053
* [`booksum`](booksum.md)
10481054
* [`databricks_dolly`](databricks_dolly.md)
1055+
* [`multi_news`](multi_news.md)
10491056
* [`newsroom`](newsroom.md)
10501057
* [`reddit`](reddit.md)
10511058
* [`reddit_tifu`](reddit_tifu.md)

0 commit comments

Comments
 (0)