Sponsored by
AI-powered epidemiological intelligence
A Python library providing unified access to 21 epidemiological data sources from around the world, with a plugin registry, CLI, and optional extras for specialized data.
- Overview
- Installation
- Quick Start
- Repository Structure
- Available Datasets
- CLI Usage
- Usage Examples
- Available Sources
- FAQ
- Contributing
- Related Projects
- Citation
- License
epidatasets provides:
- Unified interface β A single
get_source()API to access 21 data sources worldwide - Plugin registry β Sources are discovered at runtime via
entry_points, making it easy to extend - Optional extras β Install only the dependencies you need (
pip install epidatasets[who,brazil]) - CLI β Command-line tool for listing sources, inspecting metadata, and querying countries
- Caching & rate limiting β Built-in utilities for responsible API usage
- Reproducible research β Standardized access to heterogeneous epidemiological datasets
pip install epidatasets# WHO Global Health Observatory data
pip install epidatasets[who]
# Brazilian DATASUS/SINAN data via PySUS
pip install epidatasets[brazil]
# Eurostat EU health statistics
pip install epidatasets[eurostat]
# Climate/environmental data (Copernicus CDS)
pip install epidatasets[climate]
# Geospatial visualization
pip install epidatasets[geo]
# Plotting & visualization
pip install epidatasets[viz]
# Genomic data (Pathoplexus)
pip install epidatasets[genomics]
# CLI support
pip install epidatasets[cli]
# World Bank indicators
pip install epidatasets[worldbank]
# Install everything
pip install epidatasets[all]git clone https://github.com/fccoelho/epidemiological-datasets.git
cd epidemiological-datasets
pip install -e ".[dev,docs]"from epidatasets import get_source, list_sources
# Discover available sources
sources = list_sources()
for name, meta in sorted(sources.items()):
print(f"{name}: {meta['description']}")
# Get a specific source
paho = get_source("paho")
countries = paho.list_countries()
print(f"PAHO covers {len(countries)} countries")
# Get WHO data (requires: pip install epidatasets[who])
who = get_source("who")
malaria = who.get_indicator(
indicator="MALARIA_EST_INCIDENCE",
years=[2020, 2021, 2022],
countries=["BRA", "IND", "NGA"]
)
# Get OWID COVID-19 data
owid = get_source("owid")
covid = owid.get_covid_data(
countries=["BRA", "USA", "IND"],
metrics=["cases", "deaths"]
)epidemiological-datasets/
βββ src/epidatasets/ # Main Python package
β βββ __init__.py # Public API (get_source, list_sources)
β βββ _base.py # BaseAccessor ABC
β βββ _registry.py # Plugin registry (entry_points)
β βββ cli.py # CLI (typer)
β βββ sources/ # 21 data source accessors
β β βββ __init__.py
β β βββ africa_cdc.py
β β βββ cdc_opendata.py
β β βββ china_cdc.py
β β βββ colombia_ins.py
β β βββ copernicus_cds.py
β β βββ datasus_pysus.py
β β βββ ecdc_opendata.py
β β βββ epipulse.py
β β βββ eurostat.py
β β βββ global_health.py
β β βββ healthdata_gov.py
β β βββ india_idsp.py
β β βββ infodengue_api.py
β β βββ malaria_atlas.py
β β βββ owid.py
β β βββ paho.py
β β βββ pathoplexus.py
β β βββ respicast.py
β β βββ rki_germany.py
β β βββ ukhsa.py
β β βββ who_ghoclient.py
β βββ utils/ # Utilities
β βββ cache.py # Caching layer
β βββ rate_limit.py # API rate limiting
β βββ geo.py # Geospatial helpers
β βββ validation.py # Data validation
β βββ io.py # I/O utilities
βββ tests/ # Test suite
β βββ sources/
β βββ utils/
β βββ conftest.py
β βββ ...
βββ docs/ # MkDocs documentation
β βββ mkdocs.yml
β βββ docs/
β βββ index.md
β βββ installation.md
β βββ quickstart.md
β βββ sources/ # Per-source API docs (21 pages)
β βββ api/ # API reference
β β βββ base.md
β β βββ registry.md
β β βββ cli.md
β β βββ utils.md
β βββ examples/ # Jupyter notebooks
βββ mkdocs.yml # Docs config
βββ .readthedocs.yaml # ReadTheDocs config
βββ pyproject.toml # Package configuration
βββ README.md
| Dataset | Description | Update Frequency | Access Level | Module |
|---|---|---|---|---|
| WHO Global Health Observatory | Health indicators by country | Annual | Open | epidatasets.sources.who_ghoclient |
| Our World in Data - Health | COVID-19, vaccination, excess mortality | Daily/Weekly | Open | epidatasets.sources.owid |
| Global Health Data Exchange (GHDx) | Catalog of health datasets | Varies | Varies | Catalog only |
| HDX (Humanitarian Data Exchange) | Health in crisis contexts | Real-time | Open | Planned |
| Global.health | Pandemic linelist data | Varies | Open | epidatasets.sources.global_health |
| Malaria Atlas Project | Malaria prevalence & vector data | Annual | Open | epidatasets.sources.malaria_atlas |
| Copernicus Climate Data Store | Environmental & climate data | Varies | Open | epidatasets.sources.copernicus_cds |
| Pathoplexus | Pathogen genomic data | Continuous | Open | epidatasets.sources.pathoplexus |
| InfoDengue | Dengue surveillance (Brazil) | Weekly | Open | epidatasets.sources.infodengue_api |
| Dataset | Description | Update Frequency | Access Level | Module |
|---|---|---|---|---|
| CDC Open Data | CDC datasets portal (COVID-19, Influenza, NNDSS, CDI) | Varies | Open | epidatasets.sources.cdc_opendata |
| HealthData.gov | US health system data | Weekly | Open | epidatasets.sources.healthdata_gov |
| Statistics Canada - Health | Canadian health data | Quarterly | Open | Planned |
| Dataset | Description | Update Frequency | Access Level | Module |
|---|---|---|---|---|
| SINAN / DATASUS - Brazil | Brazilian notifiable diseases & health system data | Weekly | Open* | epidatasets.sources.datasus_pysus |
| PAHO/WHO Regional Data | Pan-American health data | Monthly | Open | epidatasets.sources.paho |
| Chile DEIS | Chilean health statistics | Monthly | Open | Planned |
| Colombia INS | Colombian public health data (SIVIGILA) | Weekly | Open | epidatasets.sources.colombia_ins |
*Note: DATASUS access requires
pip install epidatasets[brazil](installs PySUS).
| Dataset | Description | Update Frequency | Access Level | Module |
|---|---|---|---|---|
| ECDC EpiPulse | European surveillance portal (53 countries, 50+ diseases) | Daily/Weekly | Registration | epidatasets.sources.epipulse |
| ECDC Open Data | Infectious disease surveillance (50+ diseases, 30 countries) | Weekly | Open | epidatasets.sources.ecdc_opendata |
| ECDC RespiCast | Respiratory disease forecasting hub | Weekly | Open | epidatasets.sources.respicast |
| Eurostat Health | EU health statistics | Annual | Open | epidatasets.sources.eurostat |
| UK Health Security Agency | UK health data | Weekly | Open | epidatasets.sources.ukhsa |
| Robert Koch Institute | German surveillance data | Weekly | Open | epidatasets.sources.rki_germany |
| Dataset | Description | Update Frequency | Access Level | Module |
|---|---|---|---|---|
| WHO Afro Health Observatory | African region health data | Annual | Open | epidatasets.sources.who_ghoclient |
| Africa CDC | African public health data (55 AU member states) | Weekly | Open | epidatasets.sources.africa_cdc |
| Dataset | Description | Update Frequency | Access Level | Module |
|---|---|---|---|---|
| China CDC Weekly | Chinese surveillance data | Weekly | Open | epidatasets.sources.china_cdc |
| IDSP India | Indian disease surveillance | Weekly | Open* | epidatasets.sources.india_idsp |
| NIID Japan | Japanese infectious disease data | Weekly | Open | Planned |
| Korea CDC | Korean disease control data | Weekly | Open | Planned |
| Dataset | Description | Update Frequency | Access Level | Module |
|---|---|---|---|---|
| Australian Institute of Health and Welfare | Australian health data | Annual | Open | Planned |
| NZ Ministry of Health | New Zealand health statistics | Annual | Open | Planned |
The epidatasets CLI provides quick access from the terminal (requires pip install epidatasets[cli]):
# List all available data sources
epidatasets sources
# Show detailed info about a source
epidatasets info who
# List countries covered by a source
epidatasets countries pahofrom epidatasets import get_source
who = get_source("who")
# Get malaria incidence data
data = who.get_indicator(
indicator="MALARIA_EST_INCIDENCE",
years=[2020, 2021, 2022],
countries=["BRA", "IND", "NGA"]
)
print(data.head())from epidatasets import get_source
paho = get_source("paho")
# List member countries
countries = paho.list_countries()
print(f"Total countries: {len(countries)}")
# Get immunization coverage
coverage = paho.get_immunization_coverage(
vaccines=['DTP3', 'MCV1'],
subregion='Southern Cone',
years=[2020, 2021, 2022]
)
# Compare health indicators
comparison = paho.compare_countries(
indicator='LIFE_EXPECTANCY',
countries=['BRA', 'MEX', 'ARG', 'COL'],
years=[2019, 2020, 2021]
)from epidatasets import get_source
eurostat = get_source("eurostat")
# Healthcare expenditure
expenditure = eurostat.get_healthcare_expenditure(
countries=['DEU', 'FRA', 'ITA'],
years=list(range(2015, 2024))
)
# Mortality data by cause
mortality = eurostat.get_mortality_data(
cause_code='COVID-19',
countries=['DEU', 'FRA', 'ITA'],
years=[2020, 2021, 2022]
)
# Life expectancy comparison
life_exp = eurostat.get_life_expectancy(
countries=['DEU', 'FRA', 'ITA', 'ESP'],
years=[2019, 2020, 2021]
)from epidatasets import get_source
owid = get_source("owid")
# COVID-19 data for specific countries
covid = owid.get_covid_data(
countries=['BRA', 'USA', 'IND'],
metrics=['cases', 'deaths', 'hospitalizations'],
start_date='2021-01-01',
end_date='2021-12-31'
)
# Excess mortality estimates
excess = owid.get_excess_mortality(
countries=['GBR', 'ITA', 'USA'],
start_date='2020-03-01'
)
# Global summary
summary = owid.get_global_summary()from epidatasets import get_source
datasus = get_source("datasus")
# Access Brazilian notifiable disease data
dengue = datasus.download(
disease="Dengue",
years=[2022, 2023],
states=["RJ", "SP", "MG"]
)from epidatasets import get_source
africa_cdc = get_source("africa_cdc")
# List all 55 African Union member states
countries = africa_cdc.list_countries()
# Get disease outbreaks
ebola = africa_cdc.get_disease_outbreaks(
disease='EBOLA',
countries=['CD', 'UG', 'GN']
)
# Vaccination coverage
vax = africa_cdc.get_vaccination_coverage(
countries=['NG', 'ET', 'ZA'],
vaccines=['COVID-19', 'Measles']
)from epidatasets import get_source
rki = get_source("rki")
# COVID-19 nowcasting with R estimates
nowcast = rki.get_covid_nowcast(
date_range=('2022-01-01', '2022-06-30')
)
# Influenza surveillance
flu = rki.get_influenza_data(seasons=['2022/23', '2023/24'])from epidatasets import get_source, list_sources
# See all available sources
print(list_sources().keys())
# Compare data across sources
who = get_source("who")
owid = get_source("owid")
who_malaria = who.get_indicator(
indicator="MALARIA_EST_INCIDENCE",
years=[2022],
countries=["BRA"]
)
owid_covid = owid.get_covid_data(
countries=["BRA"],
metrics=["cases", "deaths"],
start_date='2022-01-01',
end_date='2022-12-31'
)| Source Name | Class | Extra | Description |
|---|---|---|---|
africa_cdc |
AfricaCDCAccessor |
β | Africa CDC public health data (55 AU states) |
cdc_opendata |
CDCOpenDataAccessor |
β | US CDC Open Data portal |
china_cdc |
ChinaCDCAccessor |
β | China CDC Weekly surveillance |
colombia_ins |
ColombiaINSAccessor |
β | Colombia INS/SIVIGILA surveillance |
copernicus_cds |
CopernicusCDSAccessor |
[climate] |
Copernicus Climate Data Store |
datasus |
DataSUSAccessor |
[brazil] |
Brazilian DATASUS/SINAN (via PySUS) |
ecdc |
ECDCOpenDataAccessor |
β | ECDC infectious disease data |
epipulse |
EpiPulseAccessor |
β | ECDC EpiPulse surveillance portal |
eurostat |
EurostatAccessor |
[eurostat] |
EU health statistics |
global_health |
GlobalHealthAccessor |
β | Global.health pandemic linelist data |
healthdata_gov |
HealthDataGovAccessor |
β | US HealthData.gov |
india_idsp |
IndiaIDSPAccessor |
β | India IDSP disease surveillance |
infodengue |
InfoDengueAPI |
β | InfoDengue dengue surveillance (Brazil) |
malaria_atlas |
MalariaAtlasAccessor |
β | Malaria Atlas Project data |
owid |
OWIDAccessor |
β | Our World in Data (COVID-19, vaccination) |
paho |
PAHOAccessor |
β | PAHO Pan-American health data |
pathoplexus |
PathoplexusAccessor |
[genomics] |
Pathoplexus pathogen genomic data |
respicast |
RespiCastAccessor |
β | ECDC respiratory disease forecasting |
rki |
RKIGermanyAccessor |
β | Robert Koch Institute (Germany) |
ukhsa |
UKHSAAccessor |
β | UK Health Security Agency |
who |
WHOAccessor |
[who] |
WHO Global Health Observatory |
A Python library providing a unified interface to 21 epidemiological data sources worldwide, installable via pip install epidatasets.
No. The base install covers most sources. Only install extras for sources that need them (e.g., pip install epidatasets[who] for WHO GHO data, pip install epidatasets[brazil] for DATASUS).
from epidatasets import list_sources
print(list_sources())Or from the CLI: epidatasets sources
Most accessors provide working data retrieval. Some are structured placeholders for sources that require registration or have limited public APIs. Check the documentation for each source's status.
Yes! Sources are registered via entry_points in pyproject.toml. See CONTRIBUTING.md for guidelines on adding new accessors.
We welcome contributions! Please see CONTRIBUTING.md for detailed guidelines.
- π Contributing Guide - How to get started
- π Report a Bug
- π‘ Request a Feature
- π Request a Data Source
- π¬ GitHub Discussions - Ask questions, share ideas
- New data source accessors - Especially from underrepresented regions
- Example notebooks - Jupyter notebooks demonstrating data analysis
- Documentation - Translations, improvements, and API docs
- Bug fixes - Check the issue tracker
| Project | Description | Repository |
|---|---|---|
| PySUS | Brazilian health data (DATASUS) | AlertaDengue/PySUS |
| ghoclient | WHO Global Health Observatory | fccoelho/ghoclient |
| epigrass | Epidemic simulation | EpiGrass/epigrass |
| epimodels | Mathematical epidemiology | fccoelho/epimodels |
- Data sources: 21 registered (via plugin registry)
- Countries covered: 100+
- Optional extras: 10 (
who,brazil,eurostat,climate,geo,viz,genomics,cli,worldbank,search) - Example notebooks: 20+
- Documentation: epidatasets.readthedocs.io
If you use this package in your research, please cite:
@misc{fccoelho_epidatasets,
author = {Coelho, FlΓ‘vio CodeΓ§o},
title = {Epidatasets: Python Access to Epidemiological Datasets Worldwide},
year = {2026},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/fccoelho/epidemiological-datasets}}
}For PySUS:
@software{pysus,
author = {AlertaDengue Team},
title = {PySUS: Tools for Brazilian Public Health Data},
url = {https://github.com/AlertaDengue/PySUS}
}This project is licensed under the MIT License - see the LICENSE file for details.
This project is sponsored by
Kwar-AI β Intelligence for Epidemiology
AI-powered solutions for disease surveillance and outbreak prediction
- PySUS Contributors - For making Brazilian health data accessible
- WHO - For maintaining the Global Health Observatory
- All data providers who make epidemiological data openly accessible
- Global public health community
- Author: FlΓ‘vio CodeΓ§o Coelho (@fccoelho)
- Website: https://fccoelho.github.io/
- Documentation: https://epidatasets.readthedocs.io
Made with β€οΈ for the epidemiological research community
π Report Bug β’ π‘ Request Feature β’ π¬ Discussions
Disclaimer: This repository is a community effort to catalog open data sources. Please always refer to the original data providers for official statistics and verify data usage terms. The maintainers are not responsible for data quality or availability.