WSI Toolbox

Note: This package is currently unstable. API may change without notice.

A comprehensive toolkit for Whole Slide Image (WSI) processing, feature extraction, and clustering analysis.

Installation

# From PyPI
pip install wsi-toolbox

# From GitHub (latest)
pip install git+https://github.com/technoplasm/wsi-toolbox.git

Supported Models

The following foundation models are available:

Model	Arch	Params	Dim	HuggingFace
`uni`	ViT-L/16	300M	1024	MahmoodLab/UNI
`uni2` (default)	ViT-H/14	681M	1536	MahmoodLab/UNI2-h
`gigapath`	ViT-g/14	1.1B	1536	prov-gigapath/prov-gigapath
`virchow`	ViT-H/14	632M	1280	paige-ai/Virchow
`virchow2`	ViT-H/14	632M	1280	paige-ai/Virchow2
`h-optimus-0`	ViT-g/14	1.1B	1536	bioptimus/H-optimus-0
`conch15`	ViT-L/16	300M	1024	MahmoodLab/conchv1_5
`conch15_768`	ViT-L/16	300M	768	MahmoodLab/conchv1_5
`midnight`	ViT-g/14	1.1B	1536	SophontAI/OpenMidnight
`phikon2`	ViT-L/16	300M	1024	owkin/phikon-v2

conch15_768 outputs FC-projected features (not cls_token), intended for TITAN input.

Setup: These models require HuggingFace authentication. Accept the license on each model page, then:

huggingface-cli login

GPU Configuration

Device selection is controlled by --device / -D (CLI) or set_default_device() (Python). Default is auto.

Value	Behavior
`auto` (default)	Detect all GPUs. Multiple GPUs → parallel inference. Single GPU → `cuda:0`. No GPU → `cpu` (with warning)
`cuda:0`	Use GPU 0 only. Falls back to `cpu` if unavailable (with warning)
`cuda:1`	Use GPU 1 only
`cuda:0,1,3`	Use specified GPUs in parallel
`cpu`	CPU only

wt extract -i sample.ndpi -D auto           # Auto-detect (default)
wt extract -i sample.ndpi -D cuda:0         # Single GPU
wt extract -i sample.ndpi -D cuda:0,1       # 2 GPUs in parallel

wt.set_default_device('cuda:0,1')  # Use GPU 0 and 1

For the Streamlit app, set via environment variable:

WT_DEVICE=cuda:0 uv run task app

Quick Start

# 1. Extract features from WSI
wt extract -i sample.ndpi -o sample.h5

# 2. Run clustering
wt cluster -i sample.h5

# 3. Generate preview image (requires sample.ndpi in same directory)
wt preview -i sample.h5

import wsi_toolbox as wt

wt.set_default_model_preset('uni2')
wt.set_default_device('auto')

# 1. Extract
cmd = wt.FeatureExtractionCommand(batch_size=256)
cmd('sample.h5', wsi_path='sample.ndpi')

# 2. Cluster
cluster_cmd = wt.ClusteringCommand(resolution=1.0)
cluster_cmd(['sample.h5'])

# 3. Preview
preview_cmd = wt.PreviewClustersCommand()
img = preview_cmd('sample.h5')
img.save('sample_preview.jpg')

Important: preview / preview-score commands require the original WSI file with the same stem in the same directory (e.g., sample.h5 needs sample.ndpi).

Commands

CLI is available as wsi-toolbox or wt. Each command has --help.

extract

Extract patch embeddings from WSI using foundation models.

CLI	Python
`wt extract -i sample.ndpi -o sample.h5`	`FeatureExtractionCommand()(h5_path, wsi_path=...)`

wt extract -i sample.ndpi -o sample.h5
wt extract -i sample.ndpi -M gigapath      # Use Gigapath model
wt extract -i sample.ndpi -M virchow2      # Use Virchow2 model
wt extract -i sample.ndpi -M conch15_768   # CONCH v1.5 (768D via AttentionalPooler)
wt extract -i sample.ndpi -M midnight      # OpenMidnight model
wt extract -i sample.ndpi -L               # Include latent features
wt extract -i sample.ndpi -D cuda:0,1      # Multi-GPU parallel

cmd = wt.FeatureExtractionCommand(batch_size=256, with_latent=True)
result = cmd('sample.h5', wsi_path='sample.ndpi')
# result.feature_dim, result.patch_count

cluster

Run Leiden clustering on embeddings.

CLI	Python
`wt cluster -i sample.h5`	`ClusteringCommand()(['sample.h5'])`

wt cluster -i sample.h5
wt cluster -i sample.h5 --resolution 0.5   # Fewer clusters

cmd = wt.ClusteringCommand(resolution=1.0)
result = cmd(['sample.h5'])
# result.cluster_count, result.target_path

See Advanced Usage for multi-file clustering and sub-clustering.

preview

Generate cluster overlay image. Requires WSI with same stem.

CLI	Python
`wt preview -i sample.h5`	`PreviewClustersCommand()('sample.h5')`

wt preview -i sample.h5
wt preview -i sample.h5 -f 1 2 3           # Filter to clusters 1,2,3
wt preview -i sample.h5 --size 32          # Smaller thumbnails

cmd = wt.PreviewClustersCommand(size=64)
img = cmd('sample.h5', namespace='default')
img.save('preview.jpg')

umap

Compute UMAP projection.

CLI	Python
`wt umap -i sample.h5`	`UmapCommand()(['sample.h5'])`

wt umap -i sample.h5
wt umap -i sample.h5 --show                # Display plot
wt umap -i sample.h5 --save                # Save plot

cmd = wt.UmapCommand(n_neighbors=15, min_dist=0.1)
result = cmd(['sample.h5'])
# result.target_path → 'uni/default/umap'

pca

Compute PCA projection.

CLI	Python
`wt pca -i sample.h5`	`PCACommand()(['sample.h5'])`

wt pca -i sample.h5
wt pca -i sample.h5 -n 2                   # 2 components
wt pca -i sample.h5 --show                 # Display plot

cmd = wt.PCACommand(n_components=1, scaler='minmax')
result = cmd(['sample.h5'])
# result.target_path → 'uni/default/pca1'

preview-score

Generate score heatmap overlay. Requires WSI with same stem.

CLI	Python
`wt preview-score -i sample.h5 -n pca1`	`PreviewScoresCommand()('sample.h5', score_name='pca1')`

wt preview-score -i sample.h5 -n pca1
wt preview-score -i sample.h5 -n pca1 --cmap viridis
wt preview-score -i sample.h5 -n pca1 --invert

cmd = wt.PreviewScoresCommand(size=64)
img = cmd('sample.h5', score_name='pca1', cmap_name='jet')
img.save('pca_heatmap.jpg')

show

Display HDF5 file structure.

CLI	Python
`wt show -i sample.h5`	`ShowCommand()('sample.h5')`

wt show -i sample.h5
wt show -i sample.h5 -v                    # Verbose

thumb

Generate thumbnail from WSI.

CLI	Python
`wt thumb -i sample.ndpi`	`wsi.generate_thumbnail()`

wt thumb -i sample.ndpi
wt thumb -i sample.ndpi -w 1024            # Specify width

dzi

Export WSI to Deep Zoom Image format (for OpenSeadragon).

CLI	Python
`wt dzi -i sample.ndpi -o ./out`	`DziCommand()(wsi_path, output_dir, name)`

wt dzi -i sample.ndpi -o ./output
wt dzi -i sample.ndpi -o ./output -t 512   # Tile size

cache (optional)

Pre-cache patch images for repeated access:

wt cache -i sample.ndpi -o sample.h5
wt extract -i sample.h5   # Uses cache
wt preview -i sample.h5   # Uses cache

Structure:

cache/{patch_size}/
├── patches       # [N, H, W, 3] images
└── coordinates   # [N, 2] coords

migrate

Migrate old HDF5 format to new format.

wt migrate -i sample.h5
wt migrate -i sample1.h5 sample2.h5      # Multiple files

HDF5 File Structure

All data is stored in a single HDF5 file. Use wt show -i sample.h5 to inspect.

Root Attributes (Metadata)

with h5py.File('sample.h5', 'r') as f:
    # WSI metadata
    f.attrs['original_mpp']      # Original microns per pixel
    f.attrs['original_width']    # Original width (px)
    f.attrs['original_height']   # Original height (px)

    # Patch grid info
    f.attrs['mpp']               # Actual mpp used
    f.attrs['patch_size']        # Patch size (e.g., 256)
    f.attrs['patch_count']       # Total patches
    f.attrs['cols']              # Grid columns
    f.attrs['rows']              # Grid rows

Model Features

Features are stored under {model}/. Supported models: uni, uni2, gigapath, virchow, virchow2, h-optimus-0, conch15, conch15_768, midnight, phikon2.

{model}/
├── features        # [N, D] patch embeddings
│                   #   uni: D=1024
│                   #   uni2: D=1536
│                   #   gigapath: D=1536
│                   #   virchow: D=1280
│                   #   virchow2: D=1280
│                   #   h-optimus-0: D=1536
│                   #   conch15: D=1024
│                   #   conch15_768: D=768
│                   #   midnight: D=1536
│                   #   phikon2: D=1024
├── coordinates     # [N, 2] patch coordinates (x, y pixels)
└── latent_features # [N, L, D] optional (with -L flag)

with h5py.File('sample.h5', 'r') as f:
    features = f['uni/features'][:]         # (N, 1024)
    coords = f['uni/coordinates'][:]        # (N, 2)

Analysis Results (Hierarchical)

Results are stored under {model}/{namespace}/.

{model}/{namespace}/
├── clusters     # [N] cluster labels (int)
├── umap         # [N, 2] UMAP coordinates
└── pca1         # [N] PCA scores

Namespace:

Single file: default
Multi-file: file1+file2+... (auto-generated)

Sub-clustering (filter hierarchy):

uni/default/clusters                           # Base
uni/default/filter/1+2+3/clusters              # Sub-cluster of 1,2,3
uni/default/filter/1+2+3/filter/0+1/clusters   # Further nesting

See Advanced Usage for examples.

Writing Status

Large datasets have a writing attribute (True during write, False when complete).

if f['uni/features'].attrs.get('writing', False):
    raise RuntimeError('Dataset is incomplete')

Advanced Usage

Multi-file Joint Clustering

Cluster multiple WSIs together to find common patterns across samples.

# 1. Extract features from each WSI
wt extract -i sample1.ndpi -o sample1.h5
wt extract -i sample2.ndpi -o sample2.h5

# 2. Joint clustering (namespace auto-generated as "sample1+sample2")
wt cluster -i sample1.h5 sample2.h5

# 3. Analysis on joint clusters
wt pca -i sample1.h5 sample2.h5
wt umap -i sample1.h5 sample2.h5

# 4. Preview each file (uses shared cluster labels)
wt preview -i sample1.h5 -N sample1+sample2
wt preview -i sample2.h5 -N sample1+sample2

# Joint clustering
cmd = wt.ClusteringCommand()
result = cmd(['sample1.h5', 'sample2.h5'])
# → namespace: 'sample1+sample2'
# → uni/sample1+sample2/clusters in both files

Sub-clustering

Analyze a subset of clusters in more detail.

# Sub-cluster within clusters 1,2,3
wt cluster -i sample1.h5 sample2.h5 -f 1 2 3

# PCA/UMAP on filtered subset
wt pca -i sample1.h5 sample2.h5 -f 1 2 3
wt umap -i sample1.h5 sample2.h5 -f 1 2 3

# Preview filtered clusters
wt preview -i sample1.h5 -N sample1+sample2 -f 1 2 3

# Sub-cluster
cmd = wt.ClusteringCommand(parent_filters=[[1, 2, 3]])
cmd(['sample1.h5', 'sample2.h5'])
# → uni/sample1+sample2/filter/1+2+3/clusters

# PCA on filtered subset
cmd = wt.PCACommand(parent_filters=[[1, 2, 3]])
cmd(['sample1.h5', 'sample2.h5'])
# → uni/sample1+sample2/filter/1+2+3/pca1

Streamlit App

uv run task app

# Environment variables
WT_MODEL=gigapath WT_DEVICE=cuda:1 WT_PREFETCH=2 uv run task app

Development

git clone https://github.com/technoplasm/wsi-toolbox.git
cd wsi-toolbox
uv sync

uv run wt --help
uv run task app

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WSI Toolbox

Installation

Supported Models

GPU Configuration

Quick Start

Commands

extract

cluster

preview

umap

pca

preview-score

show

thumb

dzi

cache (optional)

migrate

HDF5 File Structure

Root Attributes (Metadata)

Model Features

Analysis Results (Hierarchical)

Writing Status

Advanced Usage

Multi-file Joint Clustering

Sub-clustering

Streamlit App

Development

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

WSI Toolbox

Installation

Supported Models

GPU Configuration

Quick Start

Commands

extract

cluster

preview

umap

pca

preview-score

show

thumb

dzi

cache (optional)

migrate

HDF5 File Structure

Root Attributes (Metadata)

Model Features

Analysis Results (Hierarchical)

Writing Status

Advanced Usage

Multi-file Joint Clustering

Sub-clustering

Streamlit App

Development

License