1.5 billion rows of US banking regulatory data from 20 sources spanning 1863–2026, unified into 34 queryable tables covering 217,210 institutions.
Empirical banking research requires combining data from many regulatory sources — call reports, holding company filings, failure records, stress test results, deposit data — each in different formats with different identifiers. FreeNIC harmonizes these into a single relational schema with Python, R, and AI agent interfaces.
All data comes from public regulatory filings (FFIEC, FDIC, Federal Reserve, OCC) and freely available academic databases. No proprietary data, no paywalls.
git clone https://github.com/andenick/FreeNIC.git
cd FreeNIC
# Python interface
cd Outputs/freenic_py
pip install -e .import freenic
freenic.set_data_dir("Outputs/parquet")
freenic.list_tables()
freenic.lookup_institution("jpmorgan")
freenic.get_financials(1039502, "BHCK2170") # JPMorgan total assets
freenic.get_failures(start_year=2008, end_year=2010)The Parquet data files (~5 GB) are not included in the repo and must be built from source using the ingestion pipeline. See the Ingestion Pipeline section.
| Metric | Value |
|---|---|
| Total observations | 1.5B+ |
| Institutions | 217,210 |
| Variables (MDRM) | 87,000+ |
| Tables | 34 |
| Time span | 1863–2026 |
| Data sources | 20 |
pip install -e Outputs/freenic_pyimport freenic
freenic.set_data_dir("/path/to/parquet")
freenic.list_tables()
freenic.lookup_institution("jpmorgan")
freenic.get_financials(1039502, "BHCK2170")
freenic.get_failures(start_year=2008, end_year=2010)
freenic.query("SELECT COUNT(*) FROM institutions")
freenic.get_hierarchy(1039502, direction="down")install.packages("Outputs/freenic_r", repos = NULL, type = "source")
library(freenic)
freenic_set_data_dir("/path/to/parquet")
df <- read_institutions()
df <- read_bank_failures()
freenic_query("SELECT COUNT(*) AS n FROM institutions")pip install -r Technical/freenic_mcp/requirements.txt
export FREENIC_DATA_DIR="/path/to/parquet"
python Technical/freenic_mcp/server.pyThe MCP server exposes 15 tools: query_freenic, lookup_institution, get_financials, search_variables, get_hierarchy, describe_database, describe_table, get_failures, get_fred_series, lookup_rssd, lookup_column_id, show_source_descriptions, show_regulatory_groups, verify_mdrm_codes, verify_rssds.
| # | Source | Rows | Coverage | Access |
|---|---|---|---|---|
| 1 | Chicago Fed Call Reports | 896M | 1976–2002 | chicagofed.org |
| 2 | Luck Historical Database | 312M | 1959–2025 | Academic request |
| 3 | BHCF Y-9C Filings | 208M | 1986–2025 | FFIEC CDR |
| 4 | FDIC SDI Financials | 69M | 1984–2025 | FDIC BankFind |
| 5 | OCC Historical | 9.8M | 1867–1904 | OCC Annual Reports |
| 6 | Robin Failing Banks Panel | 2.9M | 1863–2024 | Included (from Robin) |
| 7 | FDIC Summary of Deposits | 2.7M | 1994–2025 | FDIC SOD |
| 8 | FDIC Bank Failures | 4K | 1934–2025 | FDIC BankFind API |
| 9 | FDIC Historical | 500K | 1934–2025 | FDIC BankFind |
| 10 | NIC Structure Data | 36K | Current | FFIEC NIC |
| 11 | CRSP-FRB Link | 14K | 1971–2024 | NY Fed |
| 12 | MDRM Variable Dictionary | 87K | Current | Fed MDRM |
| 13 | DFAST Stress Test Results | 500 | 2013–2025 | Fed DFAST |
| 14 | Pillar 3 G-SIB Disclosure | 20K | 2020–2025 | Bank websites |
| 15 | Fed H.8 | 75K | 1973–2025 | FRED |
| 16 | FRED Banking Series | 75K | 1954–2025 | FRED API |
| 17 | Stress Test Scenarios | 1K | 2024–2026 | Fed Scenarios |
| 18 | Bank Identifier Crosswalk | 14K | Current | Derived (Robin ↔ RSSD ↔ FDIC cert) |
| 19 | BHC Hierarchy | 37K | Current | Derived (NIC structure + ownership) |
| 20 | Sector Groupings | 17K | Current | Derived (CIK → SIC → sector) |
FreeNIC/
├── README.md
├── Inputs/ Source data files (gitignored, re-downloadable)
├── Outputs/
│ ├── parquet/ 34 Parquet tables (gitignored, ~5 GB)
│ ├── freenic_py/ Python package (pip install -e .)
│ ├── freenic_r/ R package (install.packages from source)
│ ├── QUICK_START.md Connection examples, common queries
│ ├── DATA_DICTIONARY.md Full schema reference for all 34 tables
│ ├── DATA_SOURCE_INVENTORY.md Source provenance and ingestion details
│ └── COVERAGE_GAPS.md Known limitations and missing data
├── Technical/
│ ├── freenic_mcp/ MCP server for AI agents
│ ├── freenic_ingestion/ 30-script ingestion pipeline
│ │ ├── scripts/00-30 Numbered ingestion scripts
│ │ └── tests/ 7 test suites (integrity, schema, referential, etc.)
│ └── Knowledge_Base/ HDARP-processed regulatory filing instructions
└── .gitignore
The ingestion pipeline transforms raw regulatory data into the unified Parquet schema. Scripts are numbered and run in order:
| Script | What it does |
|---|---|
| 00 | Setup: create database schema |
| 01 | MDRM variable dictionary |
| 02 | NIC institution attributes |
| 03 | CRSP-FRB bank identifier link |
| 04–05 | BHCF Y-9C filings (TXT + CSV formats) |
| 06 | Y-9C schema verification |
| 07 | Chicago Fed call reports (1976–2002) |
| 08 | Luck Historical Database |
| 09 | OCC historical (1867–1904) |
| 10 | Build institution catalog |
| 11 | Build convenience views |
| 12 | Export to Parquet |
| 13 | Validate all tables |
| 16–19 | FDIC (failures, financials, SOD) |
| 20 | Cross-source identifier crosswalks |
| 23–25 | DFAST, Pillar 3, FDIC history |
| 27 | Fed H.8 aggregate series |
| 28–30 | Robin panel, Volcker catalogs, stress scenarios |
Seven test suites verify data integrity:
| Test | What it checks |
|---|---|
test_schema.py |
Column types, table existence, expected schemas |
test_integrity.py |
Row counts, null rates, value ranges |
test_referential.py |
Foreign key relationships across tables |
test_cross_source.py |
Same institution across different sources matches |
test_freshness.py |
Data covers expected date ranges |
test_regression.py |
Key aggregates haven't changed unexpectedly |
test_mcp.py |
MCP server tool responses |
- Python 3.11+ — ingestion, querying, MCP server
- R 4.x (optional) — R package only
- DuckDB —
pip install duckdb(included as dependency) - Disk — ~5 GB for Parquet files, ~20 GB for raw inputs during ingestion
- APIs — None required for querying; ingestion scripts download from public URLs
@software{freenic2026,
title = {FreeNIC: Free National Information Center — Open Banking Data Platform},
author = {Anderson, Nicholas},
year = {2026},
url = {https://github.com/andenick/FreeNIC}
}MIT