This project was built for a cybersecurity internship assessment. The goal was to take a large XML dataset from NIST's official CPE dictionary, extract the relevant details, and make them accessible in a more usable form. I wrote a Python script to parse the XML, created a schema in SQLite to store the data efficiently, and then built a REST API using Flask to interact with the dataset. The result is a simple local API that makes it easy to search and page through over 1.4 million CPE entries based on titles, URIs, and deprecation status.
| File | Purpose |
|---|---|
read_cpe.py |
Parses the XML and populates the database |
setup.sql |
SQL file that defines the schema and indexes |
cpe_data.db |
The SQLite database (generated after running the script) |
app.py |
Main Flask API backend |
official-cpe-dictionary_v2.3.xml |
Source XML file from NIST |
- Wrote a Python script using
ElementTreeto parse over 1.4 million XML records. - Created a SQLite schema and used batch inserts for performance.
- Designed and built REST API routes with Flask to support both pagination and search.
- Tested everything locally with different parameters to make sure it responds fast and accurately.
- Make sure Python 3.13+ is installed.
- This is a local project. No deployment or external database required.
Open your terminal and install Flask:
pip install flaskThis step parses the XML file and loads the data into a local SQLite database:
python read_cpe.pyThis will:
- Create the
cpe_data.dbfile - Create the schema and indexes (if not already there)
- Insert all 1.4 million+ CPE entries from
official-cpe-dictionary_v2.3.xml
Start the Flask server:
python app.pyOnce the server is running, visit this URL in your browser:
http://127.0.0.1:5000/
You should see:
"CPE API is up and running."
Returns all CPEs with pagination support.
Query Parameters:
page: Page number (default: 1)limit: Items per page (default: 10)
Example:
http://127.0.0.1:5000/api/cpes?page=2&limit=15
Filters the CPEs based on query parameters.
Query Parameters:
cpe_title: partial or full match on the titlecpe_22_uri: partial match on CPE 2.2 URIcpe_23_uri: partial match on CPE 2.3 URIdeprecation_date: filters all entries deprecated before this date. Applies to both v2.2 and v2.3 deprecation fields.
Example:
http://127.0.0.1:5000/api/cpes/search?cpe_title=windows&deprecation_date=2024-01-01
This XML file is huge and publicly available, commonly used in vulnerability analysis and asset inventories.
Visiting this:
http://127.0.0.1:5000/api/cpes?page=1&limit=5
returns JSON like:
{
"page": 1,
"limit": 5,
"total": 1413024,
"data": [
{
"id": 1,
"cpe_title": "Example CPE Title",
"cpe_22_uri": "cpe:/a:example:22",
"cpe_23_uri": "cpe:/a:example:23",
"reference_links": ["link1", "link2"],
"cpe_22_deprecation_date": null,
"cpe_23_deprecation_date": null
},
...
]
}This project demonstrates working with real-world datasets, backend API development, and designing for performance and usability all in one place.