Twitter dataset for the 2020 election in the USA

This dataset consists of roughly 1.2 million tweets addressing the 2020 presidential candidates, Joe Biden and Donald Trump. Circa 521k of these tweets address Biden specifically, 680k Donald Trump. The dataset was mined from here.

All data except for the tweet text and the time of publishing was discarded and only tweets in english were included in the dataset. #'s and @'s as well as all other alphanumerical characters apart from punctuation were removed from the tweets.

How to generate the dataset

Run get_election_tweets.sh in data/twitter/
This will download two .csv files of a kaggle dataset (https://www.kaggle.com/manchunhui/us-election-2020-tweets)
Run the filter.py script in the data/twitter/ folder.
This will extract the relevant data and write then to the election_dataset.pickle file. As this process is not parallelized and detecting the language of the tweets is computationally expensive, this may take a while.
Delete the .csv files (optional)

Alternatively:
Download the computed election_dataset.pickle file from TODO.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Twitter dataset for the 2020 election in the USA

How to generate the dataset

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Twitter dataset for the 2020 election in the USA

How to generate the dataset