This repository bundles a machine-readable export of the EUCAST MIC database plus the scripts required to regenerate it from the official source. EUCAST publishes a comprehensive MIC dataset, but only via interactive HTML tables, which hinders automated analysis. The tooling here downloads those pages, converts them into normalized CSVs, and documents every intermediate step. All scripts were built through agentic coding with GPT‑5.1‑Codex, using a multi-stage workflow (scrape → transform → validate) that first captured intermediate artifacts, then simplified the pipeline for ongoing use.
| Path | Description |
|---|---|
species_eucast_raw.csv |
Raw species list scraped from https://mic.eucast.org/search/ (EUCAST ID + display name). |
species_eucast_with_amr.csv |
Species list mapped to AMR mo codes, including automatically derived phenotypes. |
eucast_mic_html/ |
All downloaded MIC HTML pages (one per species ID). |
mic_combinations.csv |
Final species/antibiotic metadata table (see format below). |
mic_values.csv |
Final MIC distribution table (one row per dilution). |
01_fetch_species_amr_mapping.py |
Scrapes the species dropdown, resolves AMR mo codes via the AMR R package (rpy2), and infers phenotypes from name differences. |
02_download_html.py |
Downloads every MIC page with a fixed 0.5 s throttle and logs progress. |
03_convert_html_to_csv.py |
Parses the HTML tables and emits the normalized CSV outputs. |
04_validate_outputs.py |
Checks structural integrity (all dilutions present, counts sum up, ECOFF annotations match distribution counts). |
06_export_amr_hierarchies.py |
Exports microorganism/antibiotic hierarchies (one row per species_id subtype for MOs). |
07_compute_ecoff_fractions.py |
Computes ecoff_fraction (share ≤ ECOFF) per combination. |
08_add_species_ids_to_combinations.py |
Adds species_id to mic_combinations.csv (join from species_eucast_with_amr.csv). |
visualization/ |
Standalone hierarchy filter + ECOFF matrix (see below). |
One row per species/antibiotic combination.
| Column | Description |
|---|---|
combo_id |
Technical primary key, referenced by mic_values.csv. |
species_amr_mo |
AMR mo code for the microorganism. |
species_name |
EUCAST display name. |
phenotype |
Automatically derived suffix (e.g. MRSA, beta-lactamase pos). |
antibiotic_name |
EUCAST antibiotic label. |
antibiotic_amr_code |
AMR ab code resolved from antibiotics.csv or AMR::as_ab. |
distribution_count |
Number of MIC distributions aggregated by EUCAST. |
observation_count |
Total isolates underlying the distribution. |
ecoff_value |
Parsed (T)ECOFF value (parentheses removed). |
ecoff_annotation |
Interpretation of the ECOFF marker (value, tentative_ecoff, forced_ecoff, less_than_three, invalid, missing). |
confidence_lower / confidence_upper |
Parsed bounds from the “Confidence interval” column. |
ecoff_fraction |
Share of isolates with MIC ≤ ECOFF (computed by 07_compute_ecoff_fractions.py). |
species_id |
Dataset-specific organism ID (subtype) joined from species_eucast_with_amr.csv. |
One row per combination and dilution.
| Column | Description |
|---|---|
combo_id |
Foreign key to mic_combinations.csv. |
dilution_mg_l |
MIC dilution value as shown in the EUCAST table header. |
count |
Number of isolates observed at that dilution. |
python3 dataset-eucast-mic/01_fetch_species_amr_mapping.py
python3 dataset-eucast-mic/02_download_html.py
python3 dataset-eucast-mic/03_convert_html_to_csv.py
python3 dataset-eucast-mic/04_validate_outputs.py
# Optional downstream steps for visualization and stats
python3 dataset-eucast-mic/06_export_amr_hierarchies.py
python3 dataset-eucast-mic/07_compute_ecoff_fractions.py
python3 dataset-eucast-mic/08_add_species_ids_to_combinations.py
./visualization/build_filter_html.pyEach script persists its output and can be re-run independently. The validator should report no new errors—known upstream discrepancies are listed below for awareness.
We ship a standalone HTML-based explorer under visualization/. It builds a single HTML that embeds the microorganism/antibiotic hierarchies and ECOFF fractions so you can filter by organism/antibiotic and inspect an ECOFF matrix (share of isolates with MIC ≤ ECOFF).
Steps:
- Generate the hierarchies (
amr_exports/*) and ECOFF fractions:python3 06_export_amr_hierarchies.py python3 07_compute_ecoff_fractions.py python3 08_add_species_ids_to_combinations.py
- Build the HTML:
This produces
./visualization/build_filter_html.py
visualization/filter.html(usesvisualization/demo_filter.htmlas template). - Open
visualization/filter.htmlin your browser. Use the organism/antibiotic trees to filter; the matrix shows percent ≤ ECOFF (color-coded: red <95%, yellow 95–98%, green ≥99%).
Notes:
- Organism filtering operates on dataset
species_id(subtypes). Multiple subtypes with the same AMR code remain distinct; ECOFF fractions come frommic_combinations.csv(ecoff_fractioncolumn). - The default percentiles are 10% (organisms) and 20% (antibiotics); adjust with the sliders as needed.
- The scripts download live content from
mic.eucast.org; the fixed 0.5 s delay reflects good citizenship, but you should still follow EUCAST’s terms of use. - Species/antibiotic mappings rely on the AMR R package (
as_mo,as_ab) and may require occasional manual review when the website changes. - ECOFF semantics (
-,( ),(( ))) follow EUCAST’s own descriptions— ensure downstream tools respect those meanings.
The validation script reports certain inconsistencies (e.g., MIC sums not
matching observation_count, or - shown despite ≥3 distributions). These
issues are already present in the original source and are not altered by this
pipeline; treat the validator output as a heads-up for manual review. For
reference, the current run surfaces the following upstream discrepancies:
- Combo 71 (Bacteroides fragilis + Moxifloxacin): sum of MIC counts 2238 != observation_count 2237
- Combo 301 (Enterococcus faecalis + Linezolid): sum of MIC counts 31415 != observation_count 31441
- Combo 347 (Enterococcus faecium + Linezolid): sum of MIC counts 14392 != observation_count 14404
- Combo 440 (Escherichia coli + Ciprofloxacin): sum of MIC counts 15813 != observation_count 15667
- Combo 517 (Haemophilus influenzae + Moxifloxacin): sum of MIC counts 11365 != observation_count 15011
- Combo 612 (Klebsiella pneumoniae + Ciprofloxacin): sum of MIC counts 3778 != observation_count 3788
- Combo 717 (Moraxella catarrhalis + Moxifloxacin): sum of MIC counts 3835 != observation_count 4036
- Combo 919 (Pseudomonas aeruginosa + Ciprofloxacin): sum of MIC counts 26990 != observation_count 26996
- Combo 928 (Pseudomonas aeruginosa + Gatifloxacin): sum of MIC counts 6482 != observation_count 6465
- Combo 936 (Pseudomonas aeruginosa + Moxifloxacin): sum of MIC counts 3065 != observation_count 5089
- Combo 1087 (Staphylococcus aureus + Ciprofloxacin): sum of MIC counts 41721 != observation_count 41812
- Combo 1101 (Staphylococcus aureus + Gatifloxacin): sum of MIC counts 2020 != observation_count 2021
- Combo 1108 (Staphylococcus aureus + Linezolid): sum of MIC counts 66761 != observation_count 67705
- Combo 1231 (Staphylococcus epidermidis + Moxifloxacin): sum of MIC counts 9776 != observation_count 10014
- Combo 1332 (Staphylococcus saprophyticus + Ciprofloxacin): sum of MIC counts 742 != observation_count 739
- Combo 1348 (Stenotrophomonas maltophilia + Ciprofloxacin): sum of MIC counts 2961 != observation_count 2962
- Combo 1521 (Streptococcus pneumoniae + Benzylpenicillin): sum of MIC counts 15170 != observation_count 15161
- Combo 1533 (Streptococcus pneumoniae + Ciprofloxacin): sum of MIC counts 73053 != observation_count 73054
- Combo 1540 (Streptococcus pneumoniae + Erythromycin): sum of MIC counts 39854 != observation_count 39847
- Combo 1541 (Streptococcus pneumoniae + Gatifloxacin): sum of MIC counts 14709 != observation_count 14704
- Combo 1545 (Streptococcus pneumoniae + Linezolid): sum of MIC counts 60180 != observation_count 60207
- Combo 1548 (Streptococcus pneumoniae + Moxifloxacin): sum of MIC counts 26858 != observation_count 27471
- Combo 135 (Campylobacter jejuni + Sulfamethoxazole): ECOFF annotation less_than_three but distribution_count=5
- Combo 344 (Enterococcus faecium + Lasalocid): ECOFF annotation less_than_three but distribution_count=5
- Combo 349 (Enterococcus faecium + Monensin): ECOFF annotation less_than_three but distribution_count=5
- Combo 355 (Enterococcus faecium + Salinomycin): ECOFF annotation less_than_three but distribution_count=13
- Combo 656 (Lactobacillus rhamnosus + Streptomycin): ECOFF annotation less_than_three but distribution_count=3
- Combo 759 (Mycobacterium avium ATCC 700898 + Amikacin): ECOFF annotation less_than_three but distribution_count=4
- Combo 760 (Mycobacterium avium ATCC 700898 + Clarithromycin): ECOFF annotation less_than_three but distribution_count=4
- Combo 761 (Mycobacterium avium ATCC 700898 + Ethambutol): ECOFF annotation less_than_three but distribution_count=4
- Combo 762 (Mycobacterium avium ATCC 700898 + Linezolid): ECOFF annotation less_than_three but distribution_count=4
- Combo 763 (Mycobacterium avium ATCC 700898 + Moxifloxacin): ECOFF annotation less_than_three but distribution_count=4
- Combo 764 (Mycobacterium avium ATCC 700898 + Rifabutin): ECOFF annotation less_than_three but distribution_count=3
- Combo 765 (Mycobacterium avium ATCC 700898 + Rifampicin): ECOFF annotation less_than_three but distribution_count=4
- Combo 766 (Mycobacterium avium ATCC 700898 + Trimethoprim-sulfamethoxazole): ECOFF annotation less_than_three but distribution_count=4
- Combo 924 (Pseudomonas aeruginosa + Enrofloxacin): ECOFF annotation less_than_three but distribution_count=4
- Combo 1003 (Salmonella enterica + Enrofloxacin): ECOFF annotation less_than_three but distribution_count=6
- Combo 1044 (Serratia marcescens + Cefiderocol): ECOFF annotation less_than_three but distribution_count=3
- Combo 1147 (Staphylococcus aureus ATCC 29213 + Vancomycin): ECOFF annotation less_than_three but distribution_count=3
- Combo 1215 (Staphylococcus coagulase negative + Pirlimycin): ECOFF annotation less_than_three but distribution_count=3
- Combo 1349 (Stenotrophomonas maltophilia + Colistin): ECOFF annotation less_than_three but distribution_count=5
- Combo 1408 (Streptococcus anginosus + Dalbavancin): ECOFF annotation less_than_three but distribution_count=3
- Combo 1418 (Streptococcus bovis + Lefamulin): ECOFF annotation less_than_three but distribution_count=4
- Combo 1421 (Streptococcus canis + Enrofloxacin): ECOFF annotation less_than_three but distribution_count=3
- Combo 1426 (Streptococcus canis + Pirlimycin): ECOFF annotation less_than_three but distribution_count=3
- Combo 1434 (Streptococcus constellatus + Delafloxacin): ECOFF annotation less_than_three but distribution_count=3
- Combo 1614 (Streptococcus suis + Enrofloxacin): ECOFF annotation less_than_three but distribution_count=5
- Combo 1618 (Streptococcus suis + Pirlimycin): ECOFF annotation less_than_three but distribution_count=6
This project is provided strictly for didactic/demonstration purposes. The data and scripts do not support conclusions about real-world resistance prevalence or clinical decision-making. No guarantees are made regarding the accuracy, completeness, or suitability of the code, intermediate artifacts, or derived datasets. By using this repository, you acknowledge that you—and not the authors—bear all responsibility for validation, compliance, and downstream use. The authors disclaim any liability for direct or indirect consequences arising from its use.