VulCode-Scrape is a tool that automatically collects source code containing software vulnerabilities from Bitbucket, GitHub, and GitLab. The vulnerability information is sourced from the Common Vulnerabilities and Exposures (CVE) records in the public U.S. National Vulnerability Database (NVD). This tool is developed within the Horizon Europe Project LAZARUS.
This project requires two environment variables to be set before running the code:
| Variable | Description |
|---|---|
GITHUB_USER |
Your GitHub username. |
GITHUB_TOKEN |
Your GitHub Personal Access Token (PAT) with appropriate permissions. |
You can set environment variables in several ways depending on your operating system or workflow.
Add the following lines to your shell configuration file (~/.bashrc, ~/.zshrc, or ~/.profile):
export GITHUB_USER=your_github_username
export GITHUB_TOKEN=your_personal_access_tokenThen reload your shell:
source ~/.bashrc # or ~/.zshrcSet environment variables via System Properties → Environment Variables, or use PowerShell:
setx GITHUB_USER "your_github_username"
setx GITHUB_TOKEN "your_personal_access_token"Create a .env file in the project root with the following content:
GITHUB_USER=your_github_username
GITHUB_TOKEN=your_personal_access_tokenNever commit your
.envfile, tokens, or any credentials to version control.
Use a.gitignorefile to exclude.envand keep your secrets secure.
Example .gitignore entry:
.env
Run the following command to create a vulnerability database:
python main.py \
--database_dir <directory_path> \
--ini_year <start_year> \
--end_year <end_year>python main.py --database_dir Data --ini_year 2021 --end_year 2025CVEfixes: our project is inspired by its structure.
NVD data feeds: where we get the CVE records.
CWE:where we obtain the CWE information.
@misc{vulcode2025,
author={Yuejun Guo},
title={VulCode-Scrape: An open-source tool for scrapping vulnerable code from GitHub},
howpublished={\url{https://github.com/LIST-LUXEMBOURG/vulcode-scrape}},
year={2025},
}
© 2025 - Luxembourg Institute of Science and Technology. All Rights Reserved
This software is licensed under GPL V3.0 Licence