Inference and reproducibility code for GenBloom, a genetically-aligned foundation model for peripheral blood smears.
- Model weights:
MarrLab/GenBloom - Patient embeddings:
MarrLab/DinoBloom_hemato_embeddings
conda create -n genbloom python=3.9 -y
conda activate genbloom
pip install -e .inference_genbloom.ipynb — a minimal notebook that downloads the GenBloom-V and GenBloom-G checkpoints plus one example patient from HuggingFace, and runs inference end-to-end. Start here.
Recreates the WSI classification numbers (AML-Hehr, APL-AML, cAItomorph binary fold).
from huggingface_hub import snapshot_download
snapshot_download("MarrLab/GenBloom", local_dir="checkpoints")Layout:
checkpoints/genbloom_v/genbloom_v.pth
checkpoints/genbloom_g/genbloom_g_fold{0..4}.pth
Each dataset directory must contain one <patient>.h5 per patient with a features dataset of shape (N_cells, 768). If you don't have them locally, the same embeddings are released at MarrLab/DinoBloom_hemato_embeddings.
export AML_HEHR_DATA_DIR=/path/to/aml_hehr
export APL_AML_DATA_DIR=/path/to/apl_aml
export CATIOMORPH_DATA_DIR=/path/to/catiomorphGenBloom-V (vision encoder only):
python dinov2/eval/multi_dataset_eval.py \
--genbloom-v-checkpoint checkpoints/genbloom_v/genbloom_v.pth \
--output-dir outputs/classification/genbloom_vGenBloom-G — single fold:
python dinov2/eval/multi_dataset_eval.py \
--genbloom-g-checkpoint checkpoints/genbloom_g/genbloom_g_fold0.pth \
--fold 0 \
--output-dir outputs/classification/fold_0GenBloom-G — all 5 folds on SLURM:
sbatch eval_genbloom_g.slurm # 5-fold array job
sbatch eval_genbloom_v.slurm # GenBloom-V baselineResults land in outputs/classification/.../all_metrics.csv.
python plot_barplots.py --output-dir figuresApache 2.0 — see LICENSE. Derived from Meta AI's DINOv2.