Skip to content

LIST-LUXEMBOURG/vulcode-scrape

Repository files navigation

VulCode-Scrape

VulCode-Scrape is a tool that automatically collects source code containing software vulnerabilities from Bitbucket, GitHub, and GitLab. The vulnerability information is sourced from the Common Vulnerabilities and Exposures (CVE) records in the public U.S. National Vulnerability Database (NVD). This tool is developed within the Horizon Europe Project LAZARUS.

⚙️ Prerequisites

This project requires two environment variables to be set before running the code:

Variable Description
GITHUB_USER Your GitHub username.
GITHUB_TOKEN Your GitHub Personal Access Token (PAT) with appropriate permissions.

🧩 Setting Environment Variables

You can set environment variables in several ways depending on your operating system or workflow.

Option 1: Shell Profile (Linux/macOS)

Add the following lines to your shell configuration file (~/.bashrc, ~/.zshrc, or ~/.profile):

export GITHUB_USER=your_github_username
export GITHUB_TOKEN=your_personal_access_token

Then reload your shell:

source ~/.bashrc  # or ~/.zshrc

Option 2: System/User Environment Variables (Windows)

Set environment variables via System Properties → Environment Variables, or use PowerShell:

setx GITHUB_USER "your_github_username"
setx GITHUB_TOKEN "your_personal_access_token"

Option 3: .env File (Recommended for Development)

Create a .env file in the project root with the following content:

GITHUB_USER=your_github_username
GITHUB_TOKEN=your_personal_access_token

🚨 Security Note

Never commit your .env file, tokens, or any credentials to version control.
Use a .gitignore file to exclude .env and keep your secrets secure.

Example .gitignore entry:

.env

🔎 Usage

Run the following command to create a vulnerability database:

python main.py \
    --database_dir <directory_path> \
    --ini_year <start_year> \
    --end_year <end_year>

🔥 Example. The following command will collect vulnerable code with CVEs recorded from 2021 to 2025.

python main.py --database_dir Data --ini_year 2021 --end_year 2025

✨ Relevant Projects

CVEfixes: our project is inspired by its structure.

NVD data feeds: where we get the CVE records.

CWE:where we obtain the CWE information.


🌲 Citation



@misc{vulcode2025,
author={Yuejun Guo},
title={VulCode-Scrape: An open-source tool for scrapping vulnerable code from GitHub},
howpublished={\url{https://github.com/LIST-LUXEMBOURG/vulcode-scrape}},
year={2025},
}

© 2025 - Luxembourg Institute of Science and Technology. All Rights Reserved

This software is licensed under GPL V3.0 Licence

About

Software vulnerabilities elicitor based on CVE records from the US NVD

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages