Skip to content

Latest commit

 

History

History
486 lines (352 loc) · 12 KB

File metadata and controls

486 lines (352 loc) · 12 KB

WSI Toolbox

Note: This package is currently unstable. API may change without notice.

A comprehensive toolkit for Whole Slide Image (WSI) processing, feature extraction, and clustering analysis.

Installation

# From PyPI
pip install wsi-toolbox

# From GitHub (latest)
pip install git+https://github.com/technoplasm/wsi-toolbox.git

Supported Models

The following foundation models are available:

Model Arch Params Dim HuggingFace
uni ViT-L/16 300M 1024 MahmoodLab/UNI
uni2 (default) ViT-H/14 681M 1536 MahmoodLab/UNI2-h
gigapath ViT-g/14 1.1B 1536 prov-gigapath/prov-gigapath
virchow ViT-H/14 632M 1280 paige-ai/Virchow
virchow2 ViT-H/14 632M 1280 paige-ai/Virchow2
h-optimus-0 ViT-g/14 1.1B 1536 bioptimus/H-optimus-0
conch15 ViT-L/16 300M 1024 MahmoodLab/conchv1_5
conch15_768 ViT-L/16 300M 768 MahmoodLab/conchv1_5
midnight ViT-g/14 1.1B 1536 SophontAI/OpenMidnight
phikon2 ViT-L/16 300M 1024 owkin/phikon-v2

conch15_768 outputs FC-projected features (not cls_token), intended for TITAN input.

Setup: These models require HuggingFace authentication. Accept the license on each model page, then:

huggingface-cli login

GPU Configuration

Device selection is controlled by --device / -D (CLI) or set_default_device() (Python). Default is auto.

Value Behavior
auto (default) Detect all GPUs. Multiple GPUs → parallel inference. Single GPU → cuda:0. No GPU → cpu (with warning)
cuda:0 Use GPU 0 only. Falls back to cpu if unavailable (with warning)
cuda:1 Use GPU 1 only
cuda:0,1,3 Use specified GPUs in parallel
cpu CPU only
wt extract -i sample.ndpi -D auto           # Auto-detect (default)
wt extract -i sample.ndpi -D cuda:0         # Single GPU
wt extract -i sample.ndpi -D cuda:0,1       # 2 GPUs in parallel
wt.set_default_device('cuda:0,1')  # Use GPU 0 and 1

For the Streamlit app, set via environment variable:

WT_DEVICE=cuda:0 uv run task app

Quick Start

# 1. Extract features from WSI
wt extract -i sample.ndpi -o sample.h5

# 2. Run clustering
wt cluster -i sample.h5

# 3. Generate preview image (requires sample.ndpi in same directory)
wt preview -i sample.h5
import wsi_toolbox as wt

wt.set_default_model_preset('uni2')
wt.set_default_device('auto')

# 1. Extract
cmd = wt.FeatureExtractionCommand(batch_size=256)
cmd('sample.h5', wsi_path='sample.ndpi')

# 2. Cluster
cluster_cmd = wt.ClusteringCommand(resolution=1.0)
cluster_cmd(['sample.h5'])

# 3. Preview
preview_cmd = wt.PreviewClustersCommand()
img = preview_cmd('sample.h5')
img.save('sample_preview.jpg')

Important: preview / preview-score commands require the original WSI file with the same stem in the same directory (e.g., sample.h5 needs sample.ndpi).

Commands

CLI is available as wsi-toolbox or wt. Each command has --help.


extract

Extract patch embeddings from WSI using foundation models.

CLI Python
wt extract -i sample.ndpi -o sample.h5 FeatureExtractionCommand()(h5_path, wsi_path=...)
wt extract -i sample.ndpi -o sample.h5
wt extract -i sample.ndpi -M gigapath      # Use Gigapath model
wt extract -i sample.ndpi -M virchow2      # Use Virchow2 model
wt extract -i sample.ndpi -M conch15_768   # CONCH v1.5 (768D via AttentionalPooler)
wt extract -i sample.ndpi -M midnight      # OpenMidnight model
wt extract -i sample.ndpi -L               # Include latent features
wt extract -i sample.ndpi -D cuda:0,1      # Multi-GPU parallel
cmd = wt.FeatureExtractionCommand(batch_size=256, with_latent=True)
result = cmd('sample.h5', wsi_path='sample.ndpi')
# result.feature_dim, result.patch_count

cluster

Run Leiden clustering on embeddings.

CLI Python
wt cluster -i sample.h5 ClusteringCommand()(['sample.h5'])
wt cluster -i sample.h5
wt cluster -i sample.h5 --resolution 0.5   # Fewer clusters
cmd = wt.ClusteringCommand(resolution=1.0)
result = cmd(['sample.h5'])
# result.cluster_count, result.target_path

See Advanced Usage for multi-file clustering and sub-clustering.


preview

Generate cluster overlay image. Requires WSI with same stem.

CLI Python
wt preview -i sample.h5 PreviewClustersCommand()('sample.h5')
wt preview -i sample.h5
wt preview -i sample.h5 -f 1 2 3           # Filter to clusters 1,2,3
wt preview -i sample.h5 --size 32          # Smaller thumbnails
cmd = wt.PreviewClustersCommand(size=64)
img = cmd('sample.h5', namespace='default')
img.save('preview.jpg')

umap

Compute UMAP projection.

CLI Python
wt umap -i sample.h5 UmapCommand()(['sample.h5'])
wt umap -i sample.h5
wt umap -i sample.h5 --show                # Display plot
wt umap -i sample.h5 --save                # Save plot
cmd = wt.UmapCommand(n_neighbors=15, min_dist=0.1)
result = cmd(['sample.h5'])
# result.target_path → 'uni/default/umap'

pca

Compute PCA projection.

CLI Python
wt pca -i sample.h5 PCACommand()(['sample.h5'])
wt pca -i sample.h5
wt pca -i sample.h5 -n 2                   # 2 components
wt pca -i sample.h5 --show                 # Display plot
cmd = wt.PCACommand(n_components=1, scaler='minmax')
result = cmd(['sample.h5'])
# result.target_path → 'uni/default/pca1'

preview-score

Generate score heatmap overlay. Requires WSI with same stem.

CLI Python
wt preview-score -i sample.h5 -n pca1 PreviewScoresCommand()('sample.h5', score_name='pca1')
wt preview-score -i sample.h5 -n pca1
wt preview-score -i sample.h5 -n pca1 --cmap viridis
wt preview-score -i sample.h5 -n pca1 --invert
cmd = wt.PreviewScoresCommand(size=64)
img = cmd('sample.h5', score_name='pca1', cmap_name='jet')
img.save('pca_heatmap.jpg')

show

Display HDF5 file structure.

CLI Python
wt show -i sample.h5 ShowCommand()('sample.h5')
wt show -i sample.h5
wt show -i sample.h5 -v                    # Verbose

thumb

Generate thumbnail from WSI.

CLI Python
wt thumb -i sample.ndpi wsi.generate_thumbnail()
wt thumb -i sample.ndpi
wt thumb -i sample.ndpi -w 1024            # Specify width

dzi

Export WSI to Deep Zoom Image format (for OpenSeadragon).

CLI Python
wt dzi -i sample.ndpi -o ./out DziCommand()(wsi_path, output_dir, name)
wt dzi -i sample.ndpi -o ./output
wt dzi -i sample.ndpi -o ./output -t 512   # Tile size

cache (optional)

Pre-cache patch images for repeated access:

wt cache -i sample.ndpi -o sample.h5
wt extract -i sample.h5   # Uses cache
wt preview -i sample.h5   # Uses cache

Structure:

cache/{patch_size}/
├── patches       # [N, H, W, 3] images
└── coordinates   # [N, 2] coords

migrate

Migrate old HDF5 format to new format.

wt migrate -i sample.h5
wt migrate -i sample1.h5 sample2.h5      # Multiple files

HDF5 File Structure

All data is stored in a single HDF5 file. Use wt show -i sample.h5 to inspect.

Root Attributes (Metadata)

with h5py.File('sample.h5', 'r') as f:
    # WSI metadata
    f.attrs['original_mpp']      # Original microns per pixel
    f.attrs['original_width']    # Original width (px)
    f.attrs['original_height']   # Original height (px)

    # Patch grid info
    f.attrs['mpp']               # Actual mpp used
    f.attrs['patch_size']        # Patch size (e.g., 256)
    f.attrs['patch_count']       # Total patches
    f.attrs['cols']              # Grid columns
    f.attrs['rows']              # Grid rows

Model Features

Features are stored under {model}/. Supported models: uni, uni2, gigapath, virchow, virchow2, h-optimus-0, conch15, conch15_768, midnight, phikon2.

{model}/
├── features        # [N, D] patch embeddings
│                   #   uni: D=1024
│                   #   uni2: D=1536
│                   #   gigapath: D=1536
│                   #   virchow: D=1280
│                   #   virchow2: D=1280
│                   #   h-optimus-0: D=1536
│                   #   conch15: D=1024
│                   #   conch15_768: D=768
│                   #   midnight: D=1536
│                   #   phikon2: D=1024
├── coordinates     # [N, 2] patch coordinates (x, y pixels)
└── latent_features # [N, L, D] optional (with -L flag)
with h5py.File('sample.h5', 'r') as f:
    features = f['uni/features'][:]         # (N, 1024)
    coords = f['uni/coordinates'][:]        # (N, 2)

Analysis Results (Hierarchical)

Results are stored under {model}/{namespace}/.

{model}/{namespace}/
├── clusters     # [N] cluster labels (int)
├── umap         # [N, 2] UMAP coordinates
└── pca1         # [N] PCA scores

Namespace:

  • Single file: default
  • Multi-file: file1+file2+... (auto-generated)

Sub-clustering (filter hierarchy):

uni/default/clusters                           # Base
uni/default/filter/1+2+3/clusters              # Sub-cluster of 1,2,3
uni/default/filter/1+2+3/filter/0+1/clusters   # Further nesting

See Advanced Usage for examples.

Writing Status

Large datasets have a writing attribute (True during write, False when complete).

if f['uni/features'].attrs.get('writing', False):
    raise RuntimeError('Dataset is incomplete')

Advanced Usage

Multi-file Joint Clustering

Cluster multiple WSIs together to find common patterns across samples.

# 1. Extract features from each WSI
wt extract -i sample1.ndpi -o sample1.h5
wt extract -i sample2.ndpi -o sample2.h5

# 2. Joint clustering (namespace auto-generated as "sample1+sample2")
wt cluster -i sample1.h5 sample2.h5

# 3. Analysis on joint clusters
wt pca -i sample1.h5 sample2.h5
wt umap -i sample1.h5 sample2.h5

# 4. Preview each file (uses shared cluster labels)
wt preview -i sample1.h5 -N sample1+sample2
wt preview -i sample2.h5 -N sample1+sample2
# Joint clustering
cmd = wt.ClusteringCommand()
result = cmd(['sample1.h5', 'sample2.h5'])
# → namespace: 'sample1+sample2'
# → uni/sample1+sample2/clusters in both files

Sub-clustering

Analyze a subset of clusters in more detail.

# Sub-cluster within clusters 1,2,3
wt cluster -i sample1.h5 sample2.h5 -f 1 2 3

# PCA/UMAP on filtered subset
wt pca -i sample1.h5 sample2.h5 -f 1 2 3
wt umap -i sample1.h5 sample2.h5 -f 1 2 3

# Preview filtered clusters
wt preview -i sample1.h5 -N sample1+sample2 -f 1 2 3
# Sub-cluster
cmd = wt.ClusteringCommand(parent_filters=[[1, 2, 3]])
cmd(['sample1.h5', 'sample2.h5'])
# → uni/sample1+sample2/filter/1+2+3/clusters

# PCA on filtered subset
cmd = wt.PCACommand(parent_filters=[[1, 2, 3]])
cmd(['sample1.h5', 'sample2.h5'])
# → uni/sample1+sample2/filter/1+2+3/pca1

Streamlit App

uv run task app

# Environment variables
WT_MODEL=gigapath WT_DEVICE=cuda:1 WT_PREFETCH=2 uv run task app

Development

git clone https://github.com/technoplasm/wsi-toolbox.git
cd wsi-toolbox
uv sync

uv run wt --help
uv run task app

License

MIT