Twitter Media Scraper

Scrape Tweets and extract Possible Personally Identifiable Information from Twitter accounts.

What Does This Tool Do?

This tool takes a Twitter username and extracts possible PII. It aims to extract the following from tweets:

Emails
Mentions
Hashtags
URLs
IPs

For Output, the default generates a LaTex report titled: report.pdf, this name can be changed, it can also be directed to stdout instead.

Further Development should focus on increasing the number of available information, as well as improving the regex.

Installation

$ git clone https://github.com/cjharris18/Twitter-Media-Scraper.git

From here, enter the repository like so:

$ cd Twitter-Media-Scraper/

You will want to install any project dependencies, you can do this by running the following:

$ pip3 install -r requirements.txt

Running the tool can be done as follows:

$ python3 twitter-media-scraper.py

Usage

As highlighted previously, the most basic usage can be done as follows:

$ python3 twitter-media-scraper.py

Using the above command, the user will be prompted for all the fields the tool requires. These can also be specified at the command line:

$ python3 twitter-media-scraper.py -h               
usage: twitter-media-scraper.py [-h] [-t TWITTER] [-o OUTPUT] [-e] [-r REPORT] [-s]

Extract and Analyse Tweets for potential PII.

optional arguments:
  -h, --help            show this help message and exit
  -t TWITTER, --twitter TWITTER
                        Specify Twitter Username at the Command Line.
  -o OUTPUT, --output OUTPUT
                        Specify a filename to store the JSON tweet data.
  -e, --env             Do not prompt for Enviroment Variables.
  -r REPORT, --report REPORT
                        Specify a filename for the report.
  -s, --stdout          Output the information to stdout, not as a report (the default).

For getting the required Twitter Keys and Tokens required, you will need a Twitter Developer account. Please follow this link for more. The tool requires the following:

Twitter Access Token (ACCESS_TOKEN)
Twitter Access Secret (ACCESS_SECRET)
Twitter Consumer Key (CONSUMER_KEY)
Twitter Consumer Secret (CONSUMER_SECRET)
Twitter Bearer Token (BEARER_TOKEN)

License

This tool is free and open-source, licensed under the MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
twitter-media-scraper.py		twitter-media-scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Media Scraper

Scrape Tweets and extract Possible Personally Identifiable Information from Twitter accounts.

What Does This Tool Do?

Installation

Usage

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Twitter Media Scraper

Scrape Tweets and extract Possible Personally Identifiable Information from Twitter accounts.

What Does This Tool Do?

Installation

Usage

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages