GSoC 2026 Proposal PoC | Project: ScanCode.io: migrate to fetching advisories data from vulnerablecode
This repository is a working Proof-of-Concept demonstrating the core technical work required for the GSoC migration project. It shows exactly how scancode.io will need to change its API consumption layer to work with VulnerableCode's new advisory-first model introduced in v37.0.0.
VulnerableCode completed a major architectural shift in PR #1866 — moving from a vulnerability-first model to an advisory-first model. ScanCode.io's find_vulnerabilities pipeline has not yet been updated to use this new model.
Query: GET /api/packages?purl=pkg:pypi/django@2.2.0
│
▼
Response: {
"purl": "pkg:pypi/django@2.2.0", ← Package is the center
"affected_by_vulnerabilities": [ ← Vulnerabilities are nested inside
{
"vulnerability_id": "VCID-xxx",
"risk_score": 4.9,
"aliases": ["CVE-2021-33203"],
"summary": "...",
}
]
}
Problem: The package is the primary object. Vulnerability data is denormalized and duplicated across packages.
Query: GET /api/v2/advisories/?affected_packages=pkg:pypi/django@2.2.0
│
▼
Response: {
"advisory_id": "VCID-xxx", ← Advisory is the center
"aliases": ["CVE-2021-33203"],
"summary": "...",
"date_published": "2021-06-02",
"severities": [ ← Structured severity (not flat risk_score)
{ "system": "cvssv3", "value": "4.9", "scoring_elements": "CVSS:3.1/..." }
],
"affected_packages": [ ← Packages are nested inside advisory
{
"purl": "pkg:pypi/django@2.2.0",
"affected_version_range": "vers:pypi/>=2.2.0,<2.2.24",
"fixed_version": "2.2.24",
}
]
}
Benefit: The advisory is the single source of truth. All affected packages, fix versions, and severity data live in one place.
| Old V1 Field | New V2 Field | Notes |
|---|---|---|
vulnerability_id (VCID) |
advisory_id |
Primary key has changed |
aliases |
aliases |
Same — CVE/GHSA/RHSA etc. |
risk_score (float) |
severities[].value |
Now structured per scoring system |
summary |
summary |
Same |
| (not present) | date_published |
New field in V2 |
| (not present) | affected_version_range |
New — per-package version range |
| (not present) | fixed_version |
New — per-package fix version |
| Package embeds vulns | Advisory embeds packages | Core structural inversion |
Queries the live public.vulnerablecode.io API. Shows both the old V1 endpoint and the new V2 advisory endpoint side-by-side so you can see the structural difference in real responses.
python fetcher.py # anonymous (rate limited)
python fetcher.py --token <token> # with API token
python fetcher.py --no-bulk # skip bulk advisory demo callSaves raw API responses to testcase/live_api_comparison.json for offline development.
The most important file. Takes a raw V2 advisory JSON response and maps it into the data structure that scancode.io uses when saving vulnerability data to its database.
Works fully offline using embedded sample data — no API call needed.
python mapper.pyThe live_api_comparison.json file contains real responses captured from
public.vulnerablecode.io on 27/03/26.
| Package | V1 /api/packages/ |
V2 /api/v2/advisories/ |
|---|---|---|
pkg:pypi/django@2.2.0 |
37,000+ lines, full vulnerability data | {} — empty |
pkg:npm/lodash@4.17.4 |
Full data with 4 vulnerabilities | {} — empty |
The V2 advisory endpoint returned empty results for both test packages. This is not a bug in this PoC — it confirms that VulnerableCode's internal migration to the advisory-first model is still in progress. The package data exists in the V1 model but has not yet been fully re-indexed into the V2 advisory model.
This is precisely why the GSoC project is needed: scancode.io must be migrated in parallel with VulnerableCode completing its advisory indexing.
The main file to refactor is scanpipe/pipes/vulnerablecode.py.
# CURRENT (line ~30 in vulnerablecode.py)
VULNERABLECODE_API_URL = f"{base_url}/api/packages/"
# AFTER MIGRATION
VULNERABLECODE_ADVISORY_V2_URL = f"{base_url}/api/v2/advisories/"# CURRENT — queries by purl, gets package-centric response
def get_vulnerabilities_for_purl(purl, base_url, api_key=None):
response = session.get(url, params={"purl": purl})
packages = response.json().get("results", [])
for package in packages:
vulns = package.get("affected_by_vulnerabilities", []) # old nested structure
# AFTER MIGRATION — queries by affected_packages, gets advisory-centric response
def get_advisories_for_purl(purl, base_url, api_key=None):
response = session.get(url, params={"affected_packages": purl})
advisories = response.json().get("results", [])
for advisory in advisories:
affected = advisory.get("affected_packages", []) # new nested structure# CURRENT — extracts risk_score as a flat float
vulnerability_data = {
"vulnerability_id": vuln.get("vulnerability_id"),
"risk_score": vuln.get("risk_score"), # flat float
"summary": vuln.get("summary"),
"aliases": vuln.get("aliases", []),
}
# AFTER MIGRATION — extracts structured severities
advisory_data = {
"advisory_id": advisory.get("advisory_id"),
"vulnerability_id": get_primary_cve(advisory.get("aliases", [])),
"summary": advisory.get("summary"),
"aliases": advisory.get("aliases", []),
"severities": advisory.get("severities", []), # structured list
"risk_score": get_max_severity_score(advisory.get("severities", [])),
"affected_version_range": get_version_range(advisory, purl),
"fixed_version": get_fixed_version(advisory, purl),
}The find_vulnerabilities pipeline step loops over discovered packages and calls the pipe function. After migration, it will need to handle the inverted structure — one advisory can affect multiple packages, so the loop logic changes from "per package, find vulnerabilities" to "per advisory, annotate all affected packages."
# Clone the repo
git clone https://github.com/<your-username>/aboutcode-api-migration-poc
cd aboutcode-api-migration-poc
# Install dependencies
pip install -r requirements.txt
# Run the API fetcher (hits the live public API)
python fetcher.py
# Run the mapper demo (fully offline, no API needed)
python mapper.py
# Run unit tests for mapper behavior and edge cases
python -m unittest discover -s tests -v- PR #1866 — Add advisory v2 — The PR that introduced the V2Advisory model
- Issue #1882 — Advisory-first migration tracking
- PR #1966 — Migrate advisory todo to v2 — Internal VulnerableCode migration reference
- scanpipe/pipes/vulnerablecode.py — The main file to be refactored
- scanpipe/pipelines/find_vulnerabilities.py — The pipeline to be updated
This PoC was built as part of a GSoC 2026 proposal for the AboutCode organization.