[MICCAI 26] cgDDI: Controllable Generation of Diverse Dermatological Imagery for Fair and Efficient Malignancy Classification
Accurate dermatological diagnosis naturally necessitates equitable performance across diverse populations, yet a systematic lack of expertly annotated images, especially for underrepresented skin tones and rare diseases, impedes progress toward measurably fair methods. We introduce cgDDI (Controllable Generation of Diverse Dermatological Imagery), a hybrid framework that (1) synthesizes realistic healthy skin samples without disturbing other input properties, (2) maps single-sample rare lesions onto novel skin-tones and locations non-parametrically, and (3) allows for efficient parametric generation with as few as 10 training samples. The framework supports both human and automated segmentation masking, enabling scalability to datasets without pre-made lesion masks. We grow a 656-image dataset by more than 400× and validate across two datasets: biopsy-confirmed Diverse Dermatology Images (DDI) and expert-verified Fitzpatrick17k (F17k). On the DDI benchmark, we achieve malignancy classification accuracy of 86.4% under synthetic-only training and 90.9% state-of-the-art performance with real data fine-tuning, alongside leading fairness metrics. Cross-dataset experiments show +13.9% accuracy improvements on unseen F17k data despite minimal disease overlap. We openly release 266k+ synthetic images, code, and generative models to further support fairness research at https://github.com/hectorcarrion/ControllableGenDDI.
The cgDDI framework. Original images, masks, and prompts produce healthy synthetics. These serve as targets for lesion-mapped synthetics, as prior-preservation anchors, and as semantic prompts. Disease-specific concepts are learned via textual inversion and used to fine-tune latent diffusion models from which semantic synthetics are sampled. The aggregated data trains fair classifiers.
The MICCAI 2026 camera-ready PDF is hosted directly in this repository: Paper-1473.pdf. The arXiv version is embargoed by Springer until approximately September 2027 (one year after open-access publication at MICCAI 2026). The version-of-record will appear in the Lecture Notes in Computer Science proceedings volume shortly before the conference (Sept 27 – Oct 1, 2026, Strasbourg).
| Method | Mean Acc. | Light | Med. | Dark | PQD | DPM | EOM |
|---|---|---|---|---|---|---|---|
| Baseline (Real) | 82.4 | 83.3 | 74.6 | 89.7 | 77.0 | 75.2 | 58.7 |
| FairDisCo | 83.8 | 88.6 | 71.7 | 92.0 | 78.0 | 72.8 | 63.7 |
| PatchAlign | 87.4 | 89.6 | 80.3 | 92.3 | 86.9 | 74.9 | 69.6 |
| cgDDI (synth only) | 86.4 | 88.9 | 84.1 | 86.0 | 94.6 | 82.0 | 81.9 |
| cgDDI (synth + real) | 90.9 | 93.3 | 86.4 | 93.0 | 92.5 | 68.8 | 86.6 |
All values in %. PQD (Predictive Quality Disparity), DPM (Demographic Parity), EOM (Equality of Opportunity) — higher = fairer. cgDDI lifts the most important fairness metric EOM from 69.6 → 86.6 while reaching state-of-the-art accuracy across all skin-tone subgroups.
Row 1: real images used as prompts. Row 2: lesion from a donor transplanted onto the prompt image (lesion mapping). Rows 3–4: semantic synthetics conditioned on the prompt, a target disease (malignant/benign), and target skin tone (light, medium, dark).
cgDDI produces in-clinical-distribution healthy skin images across the full Fitzpatrick I–VI range — a resource not present in prior synthetic dermatology datasets.
Released openly on Hugging Face: hcarrion/ControllableGenDDI. The dataset contains:
- Healthy synthetics — 309 pixel-perfect in-distribution healthy skin images
- Lesion-mapped synthetics — single-sample rare-disease augmentations
- Semantic synthetics — textual-inversion + LoRA latent-diffusion samples
- Fairness labels — Fitzpatrick skin-tone tags for stratified evaluation
In total: 266,136 skin-tone-balanced synthetic images across 65+ disease classes.
# Download dataset via the HF CLI
hf download hcarrion/ControllableGenDDI --repo-type dataset --local-dir data/cgddiYou will also need the original DDI dataset (Diverse Dermatology Images, Daneshjou et al. 2022) and sDDI masks (FEDD) to reproduce the lesion-mapping pipeline. We do not re-host these artifacts.
cgDDI also releases 65 disease-conditioned LoRA checkpoints, one per condition, on Hugging Face. Each is a textual-inversion + LoRA adapter built on top of the base diffusion backbone.
Expand to see all 65 model URLs
Code is provided as Jupyter notebooks. The pipeline was developed and primarily executed on Colab; most dependencies come pre-installed there, and notebooks include !pip install calls for the rest. Notebooks contain embedded result visualizations that GitHub may not render — open locally or in Colab if needed.
| Stage | Notebook / script | Notes |
|---|---|---|
| 1. Healthy synthetics | healthy_gen.ipynb |
Latent-diffusion in-painting on real DDI images using sDDI masks. |
| 2. Lesion mapping | lession_mapping.ipynb |
Non-parametric transplant of single-sample rare lesions onto new skin tones / locations. |
| 3. Textual inversion | textual_inversion.ipynb / textual_inversion.py |
Learn disease-specific concept tokens. A CSV of DDI images with disease labels is required (template in notebook). Driver: train_ti.bash. |
| 4. LoRA fine-tuning | train_lora.ipynb / train_lora.py |
Fine-tune the diffusion model with disease tokens; prior-preservation loss anchored on healthy synthetics. |
| 5. Semantic sampling | semantic_sampling.ipynb |
Sample disease- and skin-tone-conditioned synthetics from the LoRA-fine-tuned model. |
| 6. Fair classification | external: PatchAlign24 | Replace input directories with the cgDDI ones. Drop lesion-mapped rows generated from prompts inside the test split to avoid leakage. |
We release this data, code and models with the intent of academic use and to promote fairness research. We do not condone unethical use of these artifacts.
We thank the open-source communities behind HuggingFace Diffusers for in-painting and LoRA implementations, Skin-Diff, FairDisCo and PatchAlign for providing baselines and a classification protocol upon which we build this repo. We also build on FEDD for sDDI segmentation masks.
@inproceedings{carrion2026cgddi,
title = {Controllable Generation of Diverse Dermatological Imagery for Fair and Efficient Malignancy Classification},
author = {Carri{\'o}n, H{\'e}ctor and Norouzi, Narges},
booktitle = {Medical Image Computing and Computer-Assisted Intervention (MICCAI)},
year = {2026},
publisher = {Springer},
series = {Lecture Notes in Computer Science}
}See LICENSE for details.


