Whole Slide Image Classification with Spatially-Aware Multiple Instance Learning
This repository contains code and models for GABMIL, a spatially-aware extension of Attention-Based MIL (ABMIL) for digital pathology.
GABMIL enhances traditional ABMIL by incorporating spatial relationships between patches without adding significant computational cost.
- Uses a lightweight Spatial Information Mixing Module (SIMM) to model interactions between patches.
- Improves performance by up to 7% AUPRC and 5% Kappa over ABMIL.
- More computationally efficient than Transformer-based methods like TransMIL.
Figure 1: Overview of GABMIL. Input WSI is divided into patches, features are extracted using a pretrained model, spatial information is integrated via SIMM, and slide-level predictions are obtained using ABMIL.
Figure 2: SIMM module configurations. (a) BOTH: BLOCK + GRID attention. (b) BLOCK attention captures local spatial info with MLPs. (c) GRID attention models grid-level spatial interactions.
Slide-Level Classification on TCGA-BRCA using ImageNet-pretrained ResNet50
| Model | AUC | F1 | Recall | Kappa | AUPRC | FLOPs |
|---|---|---|---|---|---|---|
| ABMIL | 0.88 ± 0.05 | 0.78 ± 0.06 | 0.78 ± 0.07 | 0.57 ± 0.12 | 0.67 ± 0.11 | 94M |
| TransMIL | 0.89 ± 0.05 | 0.77 ± 0.06 | 0.77 ± 0.08 | 0.55 ± 0.12 | 0.71 ± 0.11 | 614M |
| BLOCK_3 | 0.91 ± 0.04 | 0.81 ± 0.05 | 0.80 ± 0.07 | 0.62 ± 0.10 | 0.74 ± 0.09 | 94M |
| GRID_4 | 0.90 ± 0.04 | 0.79 ± 0.04 | 0.78 ± 0.05 | 0.59 ± 0.07 | 0.72 ± 0.10 | 94M |
| BOTH_4 | 0.89 ± 0.05 | 0.79 ± 0.08 | 0.78 ± 0.08 | 0.58 ± 0.16 | 0.71 ± 0.15 | 94M |
We thank the authors of MLP-Mixer and MaxViT for their valuable contributions.
Please consider citing the following paper if you use this work:
@article{keshvarikhojasteh2025spatially,
title={A Spatially-Aware Multiple Instance Learning Framework for Digital Pathology},
author={Keshvarikhojasteh, Hassan and Tifrea, Mihail and Hess, Sibylle and Pluim, Josien P.W. and Veta, Mitko},
journal={arXiv preprint arXiv:2504.17379},
year={2025}
}