Skip to content

Latest commit

 

History

History
50 lines (34 loc) · 1.47 KB

File metadata and controls

50 lines (34 loc) · 1.47 KB

GitHub Analysis (gha)

A Python tool for scraping a set of repositories from GitHub to a MongoDB database.

Using the Data

To use the data which has been collected by gha, you do not need to follow this readme and run it yourself, though you may still wish to if you want to collect a small local dataset for testing. Instead, please see the project wiki page on using the data.

Installing and Running

Prerequisites

  • Docker
  • Python 3.7 or greater

Install

  1. Clone this repository and cd into the cloned directory
  2. Create and activate a virtual environment
  3. Install this package (gha) into the virtual environment
git clone https://github.com/Southampton-RSG/github-analysis.git
cd github-analysis
python3 -m venv venv
source venv/bin/activate
pip install .

Configuration

  1. Create a GitHub personal access token at https://github.com/settings/tokens
    • No permissions are required
  2. Populate a .env file from .env.template

Running

  1. Start MongoDB database containers
    • docker-compose can be installed with pip if necessary
  2. Start gha scraper using a repo list file
    • Virtual environment created above must still be active
docker-compose up -d
gha fetch -f tests/data/UKRI_10.txt

The database web console can be accessed at http://localhost:8081/db/github/.