Skip to content

fccoelho/epidemiological-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

81 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

🌍 Epidatasets

Python 3.10+ PyPI License: MIT Code style: Black

Open Issues Help Wanted Good First Issue Data Source Requests

Documentation CI Status Code Coverage

Stars Forks Contributors

Sponsored by
Kwar-AI
AI-powered epidemiological intelligence


A Python library providing unified access to 21 epidemiological data sources from around the world, with a plugin registry, CLI, and optional extras for specialized data.

πŸ“‹ Table of Contents

🎯 Overview

epidatasets provides:

  • Unified interface β€” A single get_source() API to access 21 data sources worldwide
  • Plugin registry β€” Sources are discovered at runtime via entry_points, making it easy to extend
  • Optional extras β€” Install only the dependencies you need (pip install epidatasets[who,brazil])
  • CLI β€” Command-line tool for listing sources, inspecting metadata, and querying countries
  • Caching & rate limiting β€” Built-in utilities for responsible API usage
  • Reproducible research β€” Standardized access to heterogeneous epidemiological datasets

πŸ“¦ Installation

From PyPI

pip install epidatasets

With optional extras

# WHO Global Health Observatory data
pip install epidatasets[who]

# Brazilian DATASUS/SINAN data via PySUS
pip install epidatasets[brazil]

# Eurostat EU health statistics
pip install epidatasets[eurostat]

# Climate/environmental data (Copernicus CDS)
pip install epidatasets[climate]

# Geospatial visualization
pip install epidatasets[geo]

# Plotting & visualization
pip install epidatasets[viz]

# Genomic data (Pathoplexus)
pip install epidatasets[genomics]

# CLI support
pip install epidatasets[cli]

# World Bank indicators
pip install epidatasets[worldbank]

# Install everything
pip install epidatasets[all]

Development installation

git clone https://github.com/fccoelho/epidemiological-datasets.git
cd epidemiological-datasets
pip install -e ".[dev,docs]"

πŸš€ Quick Start

from epidatasets import get_source, list_sources

# Discover available sources
sources = list_sources()
for name, meta in sorted(sources.items()):
    print(f"{name}: {meta['description']}")

# Get a specific source
paho = get_source("paho")
countries = paho.list_countries()
print(f"PAHO covers {len(countries)} countries")

# Get WHO data (requires: pip install epidatasets[who])
who = get_source("who")
malaria = who.get_indicator(
    indicator="MALARIA_EST_INCIDENCE",
    years=[2020, 2021, 2022],
    countries=["BRA", "IND", "NGA"]
)

# Get OWID COVID-19 data
owid = get_source("owid")
covid = owid.get_covid_data(
    countries=["BRA", "USA", "IND"],
    metrics=["cases", "deaths"]
)

πŸ“ Repository Structure

epidemiological-datasets/
β”œβ”€β”€ src/epidatasets/           # Main Python package
β”‚   β”œβ”€β”€ __init__.py            # Public API (get_source, list_sources)
β”‚   β”œβ”€β”€ _base.py               # BaseAccessor ABC
β”‚   β”œβ”€β”€ _registry.py           # Plugin registry (entry_points)
β”‚   β”œβ”€β”€ cli.py                 # CLI (typer)
β”‚   β”œβ”€β”€ sources/               # 21 data source accessors
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ africa_cdc.py
β”‚   β”‚   β”œβ”€β”€ cdc_opendata.py
β”‚   β”‚   β”œβ”€β”€ china_cdc.py
β”‚   β”‚   β”œβ”€β”€ colombia_ins.py
β”‚   β”‚   β”œβ”€β”€ copernicus_cds.py
β”‚   β”‚   β”œβ”€β”€ datasus_pysus.py
β”‚   β”‚   β”œβ”€β”€ ecdc_opendata.py
β”‚   β”‚   β”œβ”€β”€ epipulse.py
β”‚   β”‚   β”œβ”€β”€ eurostat.py
β”‚   β”‚   β”œβ”€β”€ global_health.py
β”‚   β”‚   β”œβ”€β”€ healthdata_gov.py
β”‚   β”‚   β”œβ”€β”€ india_idsp.py
β”‚   β”‚   β”œβ”€β”€ infodengue_api.py
β”‚   β”‚   β”œβ”€β”€ malaria_atlas.py
β”‚   β”‚   β”œβ”€β”€ owid.py
β”‚   β”‚   β”œβ”€β”€ paho.py
β”‚   β”‚   β”œβ”€β”€ pathoplexus.py
β”‚   β”‚   β”œβ”€β”€ respicast.py
β”‚   β”‚   β”œβ”€β”€ rki_germany.py
β”‚   β”‚   β”œβ”€β”€ ukhsa.py
β”‚   β”‚   └── who_ghoclient.py
β”‚   └── utils/                 # Utilities
β”‚       β”œβ”€β”€ cache.py           # Caching layer
β”‚       β”œβ”€β”€ rate_limit.py      # API rate limiting
β”‚       β”œβ”€β”€ geo.py             # Geospatial helpers
β”‚       β”œβ”€β”€ validation.py      # Data validation
β”‚       └── io.py              # I/O utilities
β”œβ”€β”€ tests/                     # Test suite
β”‚   β”œβ”€β”€ sources/
β”‚   β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ conftest.py
β”‚   └── ...
β”œβ”€β”€ docs/                      # MkDocs documentation
β”‚   β”œβ”€β”€ mkdocs.yml
β”‚   └── docs/
β”‚       β”œβ”€β”€ index.md
β”‚       β”œβ”€β”€ installation.md
β”‚       β”œβ”€β”€ quickstart.md
β”‚       β”œβ”€β”€ sources/           # Per-source API docs (21 pages)
β”‚       β”œβ”€β”€ api/               # API reference
β”‚       β”‚   β”œβ”€β”€ base.md
β”‚       β”‚   β”œβ”€β”€ registry.md
β”‚       β”‚   β”œβ”€β”€ cli.md
β”‚       β”‚   └── utils.md
β”‚       └── examples/          # Jupyter notebooks
β”œβ”€β”€ mkdocs.yml                 # Docs config
β”œβ”€β”€ .readthedocs.yaml          # ReadTheDocs config
β”œβ”€β”€ pyproject.toml             # Package configuration
└── README.md

🌐 Available Datasets

Global 🌍

Dataset Description Update Frequency Access Level Module
WHO Global Health Observatory Health indicators by country Annual Open epidatasets.sources.who_ghoclient
Our World in Data - Health COVID-19, vaccination, excess mortality Daily/Weekly Open epidatasets.sources.owid
Global Health Data Exchange (GHDx) Catalog of health datasets Varies Varies Catalog only
HDX (Humanitarian Data Exchange) Health in crisis contexts Real-time Open Planned
Global.health Pandemic linelist data Varies Open epidatasets.sources.global_health
Malaria Atlas Project Malaria prevalence & vector data Annual Open epidatasets.sources.malaria_atlas
Copernicus Climate Data Store Environmental & climate data Varies Open epidatasets.sources.copernicus_cds
Pathoplexus Pathogen genomic data Continuous Open epidatasets.sources.pathoplexus
InfoDengue Dengue surveillance (Brazil) Weekly Open epidatasets.sources.infodengue_api

North America πŸ‡ΊπŸ‡ΈπŸ‡¨πŸ‡¦πŸ‡²πŸ‡½

Dataset Description Update Frequency Access Level Module
CDC Open Data CDC datasets portal (COVID-19, Influenza, NNDSS, CDI) Varies Open epidatasets.sources.cdc_opendata
HealthData.gov US health system data Weekly Open epidatasets.sources.healthdata_gov
Statistics Canada - Health Canadian health data Quarterly Open Planned

South America 🌎

Dataset Description Update Frequency Access Level Module
SINAN / DATASUS - Brazil Brazilian notifiable diseases & health system data Weekly Open* epidatasets.sources.datasus_pysus
PAHO/WHO Regional Data Pan-American health data Monthly Open epidatasets.sources.paho
Chile DEIS Chilean health statistics Monthly Open Planned
Colombia INS Colombian public health data (SIVIGILA) Weekly Open epidatasets.sources.colombia_ins

*Note: DATASUS access requires pip install epidatasets[brazil] (installs PySUS).

Europe πŸ‡ͺπŸ‡Ί

Dataset Description Update Frequency Access Level Module
ECDC EpiPulse European surveillance portal (53 countries, 50+ diseases) Daily/Weekly Registration epidatasets.sources.epipulse
ECDC Open Data Infectious disease surveillance (50+ diseases, 30 countries) Weekly Open epidatasets.sources.ecdc_opendata
ECDC RespiCast Respiratory disease forecasting hub Weekly Open epidatasets.sources.respicast
Eurostat Health EU health statistics Annual Open epidatasets.sources.eurostat
UK Health Security Agency UK health data Weekly Open epidatasets.sources.ukhsa
Robert Koch Institute German surveillance data Weekly Open epidatasets.sources.rki_germany

Africa 🌍

Dataset Description Update Frequency Access Level Module
WHO Afro Health Observatory African region health data Annual Open epidatasets.sources.who_ghoclient
Africa CDC African public health data (55 AU member states) Weekly Open epidatasets.sources.africa_cdc

Asia 🌏

Dataset Description Update Frequency Access Level Module
China CDC Weekly Chinese surveillance data Weekly Open epidatasets.sources.china_cdc
IDSP India Indian disease surveillance Weekly Open* epidatasets.sources.india_idsp
NIID Japan Japanese infectious disease data Weekly Open Planned
Korea CDC Korean disease control data Weekly Open Planned

Oceania πŸ‡¦πŸ‡ΊπŸ‡³πŸ‡Ώ

Dataset Description Update Frequency Access Level Module
Australian Institute of Health and Welfare Australian health data Annual Open Planned
NZ Ministry of Health New Zealand health statistics Annual Open Planned

πŸ’» CLI Usage

The epidatasets CLI provides quick access from the terminal (requires pip install epidatasets[cli]):

# List all available data sources
epidatasets sources

# Show detailed info about a source
epidatasets info who

# List countries covered by a source
epidatasets countries paho

πŸ’‘ Usage Examples

Example 1: WHO Global Health Data

from epidatasets import get_source

who = get_source("who")

# Get malaria incidence data
data = who.get_indicator(
    indicator="MALARIA_EST_INCIDENCE",
    years=[2020, 2021, 2022],
    countries=["BRA", "IND", "NGA"]
)
print(data.head())

Example 2: PAHO Pan-American Health Data

from epidatasets import get_source

paho = get_source("paho")

# List member countries
countries = paho.list_countries()
print(f"Total countries: {len(countries)}")

# Get immunization coverage
coverage = paho.get_immunization_coverage(
    vaccines=['DTP3', 'MCV1'],
    subregion='Southern Cone',
    years=[2020, 2021, 2022]
)

# Compare health indicators
comparison = paho.compare_countries(
    indicator='LIFE_EXPECTANCY',
    countries=['BRA', 'MEX', 'ARG', 'COL'],
    years=[2019, 2020, 2021]
)

Example 3: Eurostat EU Health Statistics

from epidatasets import get_source

eurostat = get_source("eurostat")

# Healthcare expenditure
expenditure = eurostat.get_healthcare_expenditure(
    countries=['DEU', 'FRA', 'ITA'],
    years=list(range(2015, 2024))
)

# Mortality data by cause
mortality = eurostat.get_mortality_data(
    cause_code='COVID-19',
    countries=['DEU', 'FRA', 'ITA'],
    years=[2020, 2021, 2022]
)

# Life expectancy comparison
life_exp = eurostat.get_life_expectancy(
    countries=['DEU', 'FRA', 'ITA', 'ESP'],
    years=[2019, 2020, 2021]
)

Example 4: Our World in Data

from epidatasets import get_source

owid = get_source("owid")

# COVID-19 data for specific countries
covid = owid.get_covid_data(
    countries=['BRA', 'USA', 'IND'],
    metrics=['cases', 'deaths', 'hospitalizations'],
    start_date='2021-01-01',
    end_date='2021-12-31'
)

# Excess mortality estimates
excess = owid.get_excess_mortality(
    countries=['GBR', 'ITA', 'USA'],
    start_date='2020-03-01'
)

# Global summary
summary = owid.get_global_summary()

Example 5: Brazil DATASUS via PySUS

from epidatasets import get_source

datasus = get_source("datasus")

# Access Brazilian notifiable disease data
dengue = datasus.download(
    disease="Dengue",
    years=[2022, 2023],
    states=["RJ", "SP", "MG"]
)

Example 6: Africa CDC Data

from epidatasets import get_source

africa_cdc = get_source("africa_cdc")

# List all 55 African Union member states
countries = africa_cdc.list_countries()

# Get disease outbreaks
ebola = africa_cdc.get_disease_outbreaks(
    disease='EBOLA',
    countries=['CD', 'UG', 'GN']
)

# Vaccination coverage
vax = africa_cdc.get_vaccination_coverage(
    countries=['NG', 'ET', 'ZA'],
    vaccines=['COVID-19', 'Measles']
)

Example 7: RKI Germany Surveillance

from epidatasets import get_source

rki = get_source("rki")

# COVID-19 nowcasting with R estimates
nowcast = rki.get_covid_nowcast(
    date_range=('2022-01-01', '2022-06-30')
)

# Influenza surveillance
flu = rki.get_influenza_data(seasons=['2022/23', '2023/24'])

Example 8: Multi-source Comparison

from epidatasets import get_source, list_sources

# See all available sources
print(list_sources().keys())

# Compare data across sources
who = get_source("who")
owid = get_source("owid")

who_malaria = who.get_indicator(
    indicator="MALARIA_EST_INCIDENCE",
    years=[2022],
    countries=["BRA"]
)

owid_covid = owid.get_covid_data(
    countries=["BRA"],
    metrics=["cases", "deaths"],
    start_date='2022-01-01',
    end_date='2022-12-31'
)

πŸ“Š Available Sources

Source Name Class Extra Description
africa_cdc AfricaCDCAccessor β€” Africa CDC public health data (55 AU states)
cdc_opendata CDCOpenDataAccessor β€” US CDC Open Data portal
china_cdc ChinaCDCAccessor β€” China CDC Weekly surveillance
colombia_ins ColombiaINSAccessor β€” Colombia INS/SIVIGILA surveillance
copernicus_cds CopernicusCDSAccessor [climate] Copernicus Climate Data Store
datasus DataSUSAccessor [brazil] Brazilian DATASUS/SINAN (via PySUS)
ecdc ECDCOpenDataAccessor β€” ECDC infectious disease data
epipulse EpiPulseAccessor β€” ECDC EpiPulse surveillance portal
eurostat EurostatAccessor [eurostat] EU health statistics
global_health GlobalHealthAccessor β€” Global.health pandemic linelist data
healthdata_gov HealthDataGovAccessor β€” US HealthData.gov
india_idsp IndiaIDSPAccessor β€” India IDSP disease surveillance
infodengue InfoDengueAPI β€” InfoDengue dengue surveillance (Brazil)
malaria_atlas MalariaAtlasAccessor β€” Malaria Atlas Project data
owid OWIDAccessor β€” Our World in Data (COVID-19, vaccination)
paho PAHOAccessor β€” PAHO Pan-American health data
pathoplexus PathoplexusAccessor [genomics] Pathoplexus pathogen genomic data
respicast RespiCastAccessor β€” ECDC respiratory disease forecasting
rki RKIGermanyAccessor β€” Robert Koch Institute (Germany)
ukhsa UKHSAAccessor β€” UK Health Security Agency
who WHOAccessor [who] WHO Global Health Observatory

❓ FAQ

What is epidatasets?

A Python library providing a unified interface to 21 epidemiological data sources worldwide, installable via pip install epidatasets.

Do I need to install all optional dependencies?

No. The base install covers most sources. Only install extras for sources that need them (e.g., pip install epidatasets[who] for WHO GHO data, pip install epidatasets[brazil] for DATASUS).

How do I discover available sources?

from epidatasets import list_sources
print(list_sources())

Or from the CLI: epidatasets sources

Are all dataset accessors fully implemented?

Most accessors provide working data retrieval. Some are structured placeholders for sources that require registration or have limited public APIs. Check the documentation for each source's status.

Can I contribute a new data source?

Yes! Sources are registered via entry_points in pyproject.toml. See CONTRIBUTING.md for guidelines on adding new accessors.

🀝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for detailed guidelines.

Quick Links for Contributors

Start Contributing

Priority Contributions

  1. New data source accessors - Especially from underrepresented regions
  2. Example notebooks - Jupyter notebooks demonstrating data analysis
  3. Documentation - Translations, improvements, and API docs
  4. Bug fixes - Check the issue tracker

Badges for Contributors

Good First Issues Help Wanted

πŸ“š Related Projects

Project Description Repository
PySUS Brazilian health data (DATASUS) AlertaDengue/PySUS
ghoclient WHO Global Health Observatory fccoelho/ghoclient
epigrass Epidemic simulation EpiGrass/epigrass
epimodels Mathematical epidemiology fccoelho/epimodels

πŸ“Š Statistics

  • Data sources: 21 registered (via plugin registry)
  • Countries covered: 100+
  • Optional extras: 10 (who, brazil, eurostat, climate, geo, viz, genomics, cli, worldbank, search)
  • Example notebooks: 20+
  • Documentation: epidatasets.readthedocs.io

πŸ“š Citation

If you use this package in your research, please cite:

@misc{fccoelho_epidatasets,
  author = {Coelho, FlΓ‘vio CodeΓ§o},
  title = {Epidatasets: Python Access to Epidemiological Datasets Worldwide},
  year = {2026},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/fccoelho/epidemiological-datasets}}
}

For PySUS:

@software{pysus,
  author = {AlertaDengue Team},
  title = {PySUS: Tools for Brazilian Public Health Data},
  url = {https://github.com/AlertaDengue/PySUS}
}

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ’œ Sponsor

This project is sponsored by

Kwar-AI

Kwar-AI β€” Intelligence for Epidemiology

AI-powered solutions for disease surveillance and outbreak prediction


πŸ™ Acknowledgments

  • PySUS Contributors - For making Brazilian health data accessible
  • WHO - For maintaining the Global Health Observatory
  • All data providers who make epidemiological data openly accessible
  • Global public health community

πŸ“ž Contact


Made with ❀️ for the epidemiological research community

πŸ› Report Bug β€’ πŸ’‘ Request Feature β€’ πŸ’¬ Discussions


Disclaimer: This repository is a community effort to catalog open data sources. Please always refer to the original data providers for official statistics and verify data usage terms. The maintainers are not responsible for data quality or availability.

About

A curated collection of openly accessible epidemiological datasets from around the world with Python access scripts

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages