mimic-icd-coder

Multi-label ICD-10 auto-coder for hospital discharge summaries — a reproducible clinical NLP + MLOps reference build.

End-to-end clinical NLP pipeline on Azure Databricks (Delta Lake + Unity Catalog + MLflow + Model Serving), built on MIMIC-IV-Note v2.2 + MIMIC-IV v3.1 Hosp. Reproducible on a single workstation or in the cloud without code branches; every methodological choice is pre-registered in DECISIONS.md and defended in reports/.

Methodological note. This project is inspired by Mullenbach et al. 2018 (CAML), which established the multi-label ICD coding benchmark on MIMIC-III/ICD-9. This work targets MIMIC-IV/ICD-10 — a different dataset and coding system. Numerical results are reported on their own terms, not as an apples-to-apples replication. See §6 Evaluation for the full caveat.

Purpose and compliance posture

Scientific-research use only under the PhysioNet Credentialed Health Data License v1.5.0. This repository builds a reproducible MIMIC-IV/ICD-10 multi-label coding pipeline and demonstrates production-grade MLOps methodology for credentialed clinical NLP. Not a clinical product. Not a commercial service. Not for clinical decision support. No MIMIC data or trained weights are redistributed through this repository — only aggregate research results (metrics, methodology, code, synthetic examples). Reproduction requires independent PhysioNet credentialing and CITI training.

Headline results

Test-split results on MIMIC-IV-Note v2.2 + MIMIC-IV v3.1 Hosp, top-50 ICD-10 codes. Patient-level held-out test split, n=12,091 admissions, seed=42.

Metric	Baseline (TF-IDF + LR)	Chunked Bio_ClinicalBERT
Micro F1	0.6174	TBD
Macro F1	0.5843	TBD
P@5	0.5259	TBD
P@8	0.4326	TBD
P@15	0.2935	TBD
Micro AUC	0.9284	TBD
Macro AUC	0.9097	TBD
Micro AUPRC	0.6263	TBD
Macro AUPRC	0.5739	TBD

Backing MLflow run: 4e577699a67a4027bc27628e9b237ac5 (local file store, data/mlruns/).

Baseline uses OneVsRestClassifier(LogisticRegression(class_weight="balanced")) over TF-IDF (1–2 grams, min_df=5, 200K vocab cap) with per-label decision thresholds tuned on validation by F1 maximization. The class_weight="balanced" + per-label F1 threshold combination deliberately trades ranked-precision (P@k) for per-label F1 — a documented baseline choice (DECISIONS.md 2026-04-23). Recovering P@k is an explicit objective of the transformer branch.

Validation→test drift ≤ 0.005 on every metric, confirming val-tuned thresholds generalize. Train/val/test are disjoint by subject_id, verified architecturally in tests/test_smoke.py::test_patient_split_disjoint — patient-level splits prevent the writing-style leakage that admission-level splits allow.

Reproducibility

mic run-all --config configs/dev.yml on fresh MIMIC-IV v3.1 Hosp + MIMIC-IV-Note v2.2 raw CSVs reproduces every headline metric to 15+ decimal places (verified 2026-04-24, MLflow run 6e809d5dfd3b46dbafae84ddba710bd7). Reproducibility is architectural: fixed patient-split seed (42), deterministic liblinear solver, file-on-disk stage boundaries that make each step independently re-runnable. A reviewer with PhysioNet credentials cloning this repository will get identical numbers end-to-end.

For the full data card, model card, EDA paper, and evaluation methodology, see reports/. AI-assistance disclosure: ACKNOWLEDGMENTS.md.

1. Study summary

Attribute	Value
Domain	Clinical NLP — automated medical coding
Input	Free-text discharge summaries (MIMIC-IV-Note v2.2)
Output	Per-code probability + thresholded binary labels over top-50 ICD-10 codes
Training cohort	122,288 admissions across 65,665 patients (MIMIC-IV v3.1 ICD-10 cohort ∩ v2.2 notes)
Top-50 coverage	94.12% of cohort admissions
License — code	Apache-2.0
License — data	PhysioNet Credentialed Health Data License v1.5.0 (not redistributed)

Performance targets

Targets for the chunked Bio_ClinicalBERT branch on the patient-level test split. The TF-IDF+LR baseline floor is Micro F1 ≥ 0.55 — below that, something upstream is broken (cohort filter, split leakage, or label misalignment).

Metric	Target	Floor
Micro F1	≥ 0.70	0.55
Macro F1	≥ 0.55	0.40
P@5	≥ 0.70	—
P@8	≥ 0.65	—

Targets are absolute, not benchmark-relative. They reflect the operational threshold for "the transformer branch is delivering value over the baseline" rather than a literature comparison.

2. System architecture

2.1 Logical topology

                         Raw MIMIC-IV (PhysioNet)
                         discharge.csv.gz
                         diagnoses_icd.csv.gz
                         admissions.csv.gz
                         patients.csv.gz
                         d_icd_diagnoses.csv.gz
                                 │
                                 ▼
     ┌────────────── Bronze ─────────────┐     Raw mirror (Parquet / Delta)
     │  gz CSV → columnar; no transforms │
     └────────────────┬──────────────────┘
                      ▼
     ┌────────────── Silver ─────────────┐     Cleaned notes, one per hadm_id
     │  de-id collapse, dedup, min 100tk │
     └────────────────┬──────────────────┘
                      ▼
     ┌────────────── Gold ───────────────┐     Model-ready artifacts
     │  top-50 ICD-10 multi-hot matrix   │     labels.npz, label_names.json,
     │  patient-level split manifest     │     splits.parquet
     └────────────────┬──────────────────┘
                      ▼
     ┌────────── Training / Eval ────────┐     MLflow tracking + Model Registry
     │  • TF-IDF + LogReg (baseline)     │     per-label threshold tuning
     │  • Chunked Bio_ClinicalBERT       │
     │  • Clinical-Longformer (fallback) │
     └────────────────┬──────────────────┘
                      ▼
     ┌────────── Serving + Monitoring ───┐     Databricks Model Serving (GPU)
     │  FastAPI-compatible scoring API   │     Evidently drift checks
     └───────────────────────────────────┘

2.2 Deployment surfaces

The same pipeline runs in two environments with no code branching. Only config paths change.

Surface	Storage	Compute	Orchestration	Tracking	Use
Local workstation	Parquet on local disk	CPU (16 threads)	`mic` CLI	File-backed MLflow	Cohort construction, EDA, baseline iteration, tests
Azure Databricks	ADLS Gen2 + Delta	CPU + GPU job clusters (NC6s_v3 for transformer)	Databricks Asset Bundles	Managed MLflow + Unity Catalog Model Registry	Transformer fine-tune, Model Serving, drift monitoring

3. Data contracts

Full cohort composition and preprocessing logic live in reports/data_card.md. Quick reference:

3.1 Inputs

Source	Version	Key fields	Notes
`mimic-iv-note/note/discharge.csv.gz`	v2.2 (Jan 2023)	`note_id, subject_id, hadm_id, note_type, note_seq, charttime, text`	331,793 rows; `note_type = 'DS'` is the only value
`mimic-iv/hosp/diagnoses_icd.csv.gz`	v3.1 (Oct 2024)	`subject_id, hadm_id, seq_num, icd_code, icd_version`	6,364,488 rows; `icd_version ∈ {9, 10}`
`mimic-iv/hosp/admissions.csv.gz`	v3.1	`subject_id, hadm_id, admittime, dischtime, ...`	546,028 rows
`mimic-iv/hosp/patients.csv.gz`	v3.1	`subject_id, gender, anchor_age, ...`	364,627 rows
`mimic-iv/hosp/d_icd_diagnoses.csv.gz`	v3.1	`icd_code, icd_version, long_title`	ICD dictionary for human-readable descriptions

The v2.2/v3.1 mismatch is deliberate. hadm_id is stable across versions; only 61 of 331,793 notes (0.018%) are orphaned. Full rationale in DECISIONS.md (2026-04-20).

3.2 Stage outputs

Stage	Artifact	Shape	Contract
Bronze	`bronze/{discharge_notes,diagnoses_icd,admissions,patients,d_icd_diagnoses}.parquet`	source schema	Lossless columnar mirror
Silver	`silver/notes.parquet`	`hadm_id, subject_id, text, n_tokens`	One row per admission; `n_tokens ≥ 100`
Gold	`gold/labels.npz`	CSR `(n_admissions, 50)`	Rows aligned 1:1 to `silver/notes.parquet`
Gold	`gold/label_names.json`	`list[str]` length 50	ICD-10 codes in column order
Gold	`gold/hadm_ids.parquet`	`hadm_id`	Row-to-`hadm_id` lookup
Gold	`gold/splits.parquet`	`row_idx, split`	Patient-level 80/10/10; no `subject_id` spans splits
Gold	`gold/baseline_model.joblib`	vectorizer + 50 LR heads	Output of `fit_baseline`
Gold	`gold/baseline_thresholds.npy`	`float64[50]`	Per-label thresholds tuned on val

Alignment invariant (pipeline.py): len(silver) == labels.shape[0]. Violation means Gold must be rebuilt.

3.3 Cohort rules

Defined in configs/*.yml under cohort:.

Rule	Default	Rationale
`icd_version`	10	ICD-10 is operationally current; mixing fragments the label space
`note_types`	`['DS']`	Discharge summaries only; v2.2 contains only DS
`min_note_tokens`	100	Drops near-empty notes that hurt baseline precision
`top_k_labels`	50	Standard label-set size in the multi-label coding literature

4. Pipeline stages

Each stage is an idempotent function in src/mimic_icd_coder/pipeline.py with a Parquet or npz checkpoint. Downstream stages read from disk, so any step can be re-run without recomputing upstream.

Stage	Entry point	Reads	Writes	Runtime (laptop)
Bronze	`mic ingest`	5 gz CSVs	5 Parquet mirrors	5–10 min
Silver	`mic silver`	`bronze/discharge_notes.parquet`	`silver/notes.parquet`	2–3 min
Gold	`mic gold`	Silver + `bronze/diagnoses_icd.parquet`	`labels.npz`, `label_names.json`, `hadm_ids.parquet`	~30 s
Splits	`mic splits`	Silver	`splits.parquet`	< 10 s
Baseline train	`mic train-baseline`	Silver + Gold + Splits	`baseline_model.joblib`, `baseline_thresholds.npy`, MLflow run	15–25 min (CPU, 16 threads)
Test eval	`mic evaluate-test`	Silver + Gold + Splits + saved model	Test metrics	~1 min
Run-all	`mic run-all`	raw gz	everything	25–40 min end-to-end

Checkpoint layout, rooted at Paths.root (default ./data):

data/
  bronze/   discharge_notes.parquet  diagnoses_icd.parquet  admissions.parquet
            patients.parquet         d_icd_diagnoses.parquet
  silver/   notes.parquet
  gold/     labels.npz  label_names.json  hadm_ids.parquet  splits.parquet
            baseline_model.joblib  baseline_thresholds.npy
  mlruns/   <MLflow experiment tree>

5. Models

Full details, architecture rationale, and ethics in reports/model_card.md.

5.1 Baseline — TF-IDF + one-vs-rest LogisticRegression

src/mimic_icd_coder/models/baseline.py

Parameter	Default	Config key
n-gram range	(1, 2)	`baseline.tfidf_ngram_range`
min doc freq	5	`baseline.tfidf_min_df`
max features	200,000	`baseline.tfidf_max_features`
LR C	1.0	`baseline.logreg_c`
`class_weight`	`balanced`	`baseline.logreg_class_weight`

Per-label decision thresholds are tuned on the validation split by maximizing per-label F1 (src/mimic_icd_coder/thresholds.py).

5.2 Transformer — Chunked Bio_ClinicalBERT (primary)

src/mimic_icd_coder/models/transformer.py, jobs/train_transformer.py

Each note is split into contiguous 512-token chunks. Each chunk runs through the BERT encoder. Per-label logits are max-pooled across chunks. This recovers signal a single 512-token window would lose — 98.74% of notes exceed 512 whitespace tokens (per reports/eda_report.md §3 token-length analysis).

Parameter	Default	Config key
model	`emilyalsentzer/Bio_ClinicalBERT` (Hugging Face, public weights)	`transformer.model_name`
max sequence length per chunk	512	`transformer.max_length`
batch size	16	`transformer.batch_size`
learning rate	2e-5	`transformer.learning_rate`
epochs	3	`transformer.epochs`
warmup ratio	0.1	`transformer.warmup_ratio`
weight decay	0.01	`transformer.weight_decay`
fp16	true	`transformer.fp16`

Early stop on validation Macro F1.

5.3 Fallback — Clinical-Longformer

Triggered only if chunked Bio_ClinicalBERT misses the Micro F1 target by more than 3 points. 4K-token context; ~3–5× slower training. Rationale in DECISIONS.md (2026-04-20).

6. Evaluation

Full methodology in reports/eval_report.qmd.

Test-split results

Held-out patient-level test split, n=12,091 admissions, 6,567 patients. Seed 42. MLflow run 4e577699a67a4027bc27628e9b237ac5.

Metric	Baseline (TF-IDF + LR)
Micro F1	0.6174
Macro F1	0.5843
P@5	0.5259
P@8	0.4326
P@15	0.2935
Micro AUC	0.9284
Macro AUC	0.9097
Micro AUPRC	0.6263
Macro AUPRC	0.5739

Metrics used

Metric	Use
Micro F1	Primary operational metric — stable under class imbalance
Macro F1	Rare-label performance across all 50 codes, equally weighted
P@5 / P@8 / P@15	Ranked-prediction precision for coder-assist workflow
Per-label F1	Error analysis on worst-performing labels

On comparisons to prior work

This work does not replicate Mullenbach et al. 2018 in a methodologically valid sense, and does not claim to. The differences are:

Different dataset. Mullenbach used MIMIC-III v1.4. This work uses MIMIC-IV v3.1 + MIMIC-IV-Note v2.2.
Different coding system. Mullenbach used ICD-9-CM. This work uses ICD-10-CM. The label spaces are non-overlapping; the top-50 codes in each are different sets covering different clinical concepts.
Different cohort. Different inclusion criteria, different size, different distributional properties.
Different difficulty. ICD-10 is more granular than ICD-9 (~70K codes vs ~14K). Top-50 ICD-10 prediction is a different problem from top-50 ICD-9 prediction.

Numerical differences between this work's results and any number reported in Mullenbach 2018 (or downstream work on MIMIC-III) are confounded by all four factors. Such comparisons would be non-equivalent and methodologically invalid, and are not reported.

What this work does take from Mullenbach 2018: the multi-label classification framing, the patient-level evaluation discipline, the use of P@k as a coder-assist-relevant metric, and the top-50 label cardinality as a tractable problem size. These are methodological inheritances, not benchmark equivalences. Future work to produce an apples-to-apples MIMIC-III/ICD-9 reproduction is tracked in DECISIONS.md.

7. Interfaces

7.1 Local CLI

Entry points registered in pyproject.toml, implemented in src/mimic_icd_coder/cli.py.

mic ingest          --config configs/dev.yml
mic silver          --config configs/dev.yml
mic gold            --config configs/dev.yml
mic splits          --config configs/dev.yml
mic train-baseline  --config configs/dev.yml
mic evaluate-test   --config configs/dev.yml
mic run-all         --config configs/dev.yml

configs/dev.yml is gitignored; copy configs/dev.example.yml and fill in your MIMIC paths. --artifacts <dir> overrides the default ./data checkpoint root.

7.2 Databricks Asset Bundle

databricks.yml. Two targets:

Target	Catalog	Run-as	Compute
`dev`	`mimic_icd_dev`	workspace user	Standard_DS4_v2 × 2 (Bronze), Standard_DS5_v2 × 2 (baseline)
`prod`	`mimic_icd`	service principal `mimic-icd-sp`	same + Standard_NC6s_v3 single-node (1× V100) for transformer

databricks bundle validate --target dev
databricks bundle deploy   --target dev
databricks bundle run ingest_bronze     --target dev
databricks bundle run train_baseline    --target dev
databricks bundle run train_transformer --target prod

7.3 Model Serving API

Databricks Model Serving endpoint, GPU-backed.

POST /serving-endpoints/mimic-icd-discharge/invocations
{
  "dataframe_records": [
    {"text": "<discharge summary text>"}
  ]
}

Response:

{
  "predictions": [
    {
      "codes":        ["I10", "I50.9", "N18.6", "E11.9"],
      "scores":       [0.94, 0.87, 0.72, 0.68],
      "thresholded":  ["I10", "I50.9"]
    }
  ]
}

8. Configuration

Template: configs/dev.example.yml. User overrides go in configs/dev.yml or configs/dev.<username>.yml, both gitignored. Schema validated by Pydantic AppConfig in src/mimic_icd_coder/config.py.

Section	Purpose
`unity_catalog`	Catalog + schema names for Bronze / Silver / Gold / Models
`data`	Input paths (local gz or ADLS `abfss://`), including `d_icd_path`
`cohort`	Cohort filters (see §3.3)
`split`	Train/val/test fractions, seed, strategy
`baseline`	TF-IDF + LR hyperparameters
`transformer`	Bio_ClinicalBERT hyperparameters
`evaluation`	Threshold strategy, top-k metric list
`mlflow`	Experiment name, registry model name
`logging`	Level + format (console or JSON)

9. Observability

Channel	Backing	Captured
Structured logs	`structlog` — console locally, JSON on Databricks	Stage start/end, row counts, label density, metric values
MLflow runs	Local file store (`data/mlruns`) or Databricks-managed	Params, metrics, model artifact, signature, thresholds, label list
Model Registry	Unity Catalog (`mimic_icd.models.discharge_top50`)	Staging / Production aliases; train-data fingerprint and git SHA tags
Drift monitoring	Evidently scheduled job (prod only)	Input distribution, prediction, and label drift

10. Security & compliance

Full details in reports/data_card.md. Headlines:

Scientific-research use only under the PhysioNet Credentialed Health Data License v1.5.0. Not a clinical product, commercial service, decision-support tool, or clinical-care application.
Re-identification. No attempt to identify patients or institutions is made. Only aggregate cohort statistics, label-level metrics, and synthetic examples are published; no note text, admission IDs, or subject IDs leave local disk.
Credentialing. Reproducing results from raw data requires the reviewer to independently complete PhysioNet credentialing (CITI training + DUA agreement) before accessing MIMIC-IV. This repository does not grant any access to the underlying data.
.gitignore blocks CSV, Parquet, gz, npz, joblib, and user-specific configs. No raw data enters this repository.
Training runs in the user's own Azure tenant (single-tenant Databricks workspace, private ADLS Gen2).
Clinical text is never sent to third-party LLM APIs. Only open-weights models hosted inside the workspace are used.
CI runs only on synthetic fixtures in tests/fixtures/synthetic_notes.py.
Notebook outputs are PHI-scanned by scripts/check_notebook_phi.py in CI and pre-commit.
Service-principal credentials are stored in Databricks secret scopes.

11. Quality gates

Gate	Tool	Enforced in
Lint	`ruff check`	Pre-commit + CI
Format	`black --check` (line length 100)	Pre-commit + CI
Types	`mypy src` (strict)	CI
Unit + integration tests	`pytest` (56 tests on synthetic fixtures, ~25 s)	CI + local
Notebook output hygiene	`nbstripout` + PHI scanner	Pre-commit + CI
Data-safety guards	Large-file check, private-key detection	Pre-commit
Bundle validity	`databricks bundle validate --target dev`	Pre-deploy
Metric floor	Baseline Micro F1 ≥ 0.55 on dev split	Manual review gate after `mic train-baseline`

Pre-commit config: .pre-commit-config.yaml. CI workflow: .github/workflows/ci.yml.

12. Quick start

12.1 Local (no credentialed data required)

git clone git@github.com:nancytanaka1/mimic-icd-coder.git
cd mimic-icd-coder

python -m venv .venv
# Windows: .\.venv\Scripts\activate    POSIX: source .venv/bin/activate
pip install -e ".[dev]"
pre-commit install

pytest -q                     # synthetic fixtures only

12.2 Local end-to-end on real MIMIC (requires PhysioNet credentials)

See LOCAL_SETUP.md for the workstation walkthrough (memory profile, expected row counts, runtime envelopes, GPU prerequisites).

cp configs/dev.example.yml configs/dev.yml        # then edit data paths
mic run-all --config configs/dev.yml
mlflow ui --backend-store-uri file:./data/mlruns --port 5000

12.3 Databricks

pip install databricks-cli
databricks configure --token
databricks bundle validate --target dev
databricks bundle deploy   --target dev
databricks bundle run ingest_bronze --target dev

13. Repository layout

mimic-icd-coder/
├── src/mimic_icd_coder/
│   ├── cli.py                CLI entry points (mic ...)
│   ├── config.py             Pydantic AppConfig
│   ├── pipeline.py           Stage orchestration + Paths
│   ├── logging_utils.py      structlog configuration
│   ├── eda.py                EDA analysis helpers
│   ├── evaluate.py           Metrics
│   ├── thresholds.py         Per-label threshold tuner
│   ├── data/
│   │   ├── ingest.py         gz CSV → DataFrame readers (pyarrow CSV engine)
│   │   ├── clean.py          Silver transforms
│   │   ├── labels.py         Top-K multi-hot label builder
│   │   └── splits.py         Patient-level splitter
│   └── models/
│       ├── baseline.py       TF-IDF + LogReg + MLflow logger
│       └── transformer.py    Chunked Bio_ClinicalBERT fine-tune wrapper
├── jobs/                     Databricks-entry-point scripts
├── notebooks/01_eda.ipynb    Cohort + label distribution EDA
├── scripts/check_notebook_phi.py
├── configs/                  dev.example.yml + gitignored user configs
├── tests/                    Unit + integration tests on synthetic fixtures
├── reports/
│   ├── data_card.md
│   ├── model_card.md
│   ├── eval_report.qmd
│   ├── eda_report.md
│   └── EDA_Report.docx
├── .github/workflows/ci.yml
├── .pre-commit-config.yaml
├── databricks.yml
├── pyproject.toml
├── DECISIONS.md
├── LOCAL_SETUP.md
├── ACKNOWLEDGMENTS.md
├── LICENSE
└── README.md

14. Implementation status

Component	Status
Scaffold, CI, pre-commit, Asset Bundle	Ready
EDA notebook + paper + data card + model card + eval report	Complete
Bronze ingestion (5 tables including ICD dictionary)	Implemented and run on real data
Silver (clean + min-token filter)	Shipped
Gold (top-50 label matrix + patient splits)	Shipped — 50-label matrix on 122,288 admissions
TF-IDF + LR baseline	Shipped — test Micro F1 0.6174, Macro F1 0.5843
Per-label threshold tuning	Implemented
Evaluation (Micro/Macro F1, P@k, AUC, AUPRC)	Implemented
Per-label error analysis + calibration + confusion patterns	Shipped — see `reports/baseline_error_analysis.md`
MLflow tracking	Local file store wired; Unity Catalog Registry write on Databricks only
Chunked Bio_ClinicalBERT fine-tune	Scaffolded — `jobs/train_transformer.py`; pre-registered predictions in error analysis doc
Clinical-Longformer fallback	Not started — trigger-driven
Azure Databricks workspace + Unity Catalog bootstrap	Not started — branched from external bootstrap project
Model Serving endpoint	Not started
Evidently drift monitoring	Not started

15. References

Mullenbach, Wiegreffe, Duke, Sun, Eisenstein (2018). Explainable Prediction of Medical Codes from Clinical Text. NAACL. https://arxiv.org/abs/1802.05695
Alsentzer, Murphy, Boag, Weng, Jin, Naumann, McDermott (2019). Publicly Available Clinical BERT Embeddings. ClinicalNLP Workshop. https://arxiv.org/abs/1904.03323
Beltagy, Peters, Cohan (2020). Longformer: The Long-Document Transformer. https://arxiv.org/abs/2004.05150
Devlin, Chang, Lee, Toutanova (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL.
Johnson et al. (2023). MIMIC-IV-Note: Deidentified free-text clinical notes. PhysioNet.
Mitchell et al. (2019). Model Cards for Model Reporting. FAccT.
Pushkarna, Zaldivar, Kjartansson (2022). Data Cards: Purposeful and Transparent Dataset Documentation. FAccT.

16. License

Code licensed under Apache-2.0 (LICENSE). MIMIC data is licensed separately under the PhysioNet Credentialed Health Data License v1.5.0 and is not redistributed via this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github/workflows		.github/workflows
configs		configs
jobs		jobs
notebooks		notebooks
reports		reports
scripts		scripts
src/mimic_icd_coder		src/mimic_icd_coder
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
ACKNOWLEDGMENTS.md		ACKNOWLEDGMENTS.md
DECISIONS.md		DECISIONS.md
LICENSE		LICENSE
LOCAL_SETUP.md		LOCAL_SETUP.md
README.md		README.md
SECURITY.md		SECURITY.md
databricks.yml		databricks.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

mimic-icd-coder

Purpose and compliance posture

Headline results

Reproducibility

1. Study summary

Performance targets

2. System architecture

2.1 Logical topology

2.2 Deployment surfaces

3. Data contracts

3.1 Inputs

3.2 Stage outputs

3.3 Cohort rules

4. Pipeline stages

5. Models

5.1 Baseline — TF-IDF + one-vs-rest LogisticRegression

5.2 Transformer — Chunked Bio_ClinicalBERT (primary)

5.3 Fallback — Clinical-Longformer

6. Evaluation

Test-split results

Metrics used

On comparisons to prior work

7. Interfaces

7.1 Local CLI

7.2 Databricks Asset Bundle

7.3 Model Serving API

8. Configuration

9. Observability

10. Security & compliance

11. Quality gates

12. Quick start

12.1 Local (no credentialed data required)

12.2 Local end-to-end on real MIMIC (requires PhysioNet credentials)

12.3 Databricks

13. Repository layout

14. Implementation status

15. References

16. License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages