⚡ Improve `Engine` Performance and Implementation #578

shaneahmed · 2023-03-31T09:55:58Z

Improve Engines performance and implementation
Redesigns PatchPredictor engine using the new EngineABC base class.
The WSIs are now processed using the same code as for the processing the patches using WSI based dataloader.
The intermediate output is saved as zarr for the WSIs to resolve memory issues.
The output of model architectures should now be a dictionary.
The output can be specified as AnnotationStore for visualisation using TIAViz.
Fix mypy Type Checks for cli/common.py
Redesigns PatchPredictor engine using the new EngineABC base class.
The WSIs are now processed using the same code as for the processing the patches using WSI based dataloader.
The intermediate output is saved as zarr for the WSIs to resolve memory issues.
The output of model architectures should now be a dictionary.
The output can be specified as AnnotationStore for visualisation using TIAViz.
Add PatchPredictor Engine based on EngineABC
Add return_probabilities option to Params
Removes merge_predictions option in PatchPredictor engine.
Defines post_process_cache_mode which allows running the algorithm on WSI
Add infer_wsi for WSI inference
Removes save_wsi_output as this is not required after post processing.
Removes merge_predictions and fixes docstring in EngineABCRunParams
compile_model is now moved to EngineABC init
Fixes bug with _calculate_scale_factor
Fixes a bug in class_dict definition.
_get_zarr_array is now a public function get_zarr_array in misc
patch_predictions_as_annotations runs the loop on patch_coords instead of class_probs

- Use `pyproject.toml` for `bdist_wheel` configuration

…-abc

- Improve `Engines` performance and implementation

codecov · 2023-03-31T10:31:07Z

Codecov Report

❌ Patch coverage is 94.72914% with 72 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.30%. Comparing base (f38e809) to head (a20ec2b).

Files with missing lines	Patch %	Lines
tiatoolbox/models/dataset/dataset_abc.py	73.97%	38 Missing ⚠️
tiatoolbox/models/engine/io_config.py	56.75%	32 Missing ⚠️
tiatoolbox/cli/nucleus_instance_segment.py	66.66%	1 Missing ⚠️
...iatoolbox/models/architecture/timm_efficientnet.py	99.19%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #578      +/-   ##
===========================================
- Coverage    99.37%   95.30%   -4.08%     
===========================================
  Files           71       79       +8     
  Lines         9175     9954     +779     
  Branches      1197     1280      +83     
===========================================
+ Hits          9118     9487     +369     
- Misses          31      431     +400     
- Partials        26       36      +10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Refactor engines_abc.py

for more information, see https://pre-commit.ci

# Conflicts: # tests/models/test_patch_predictor.py

for more information, see https://pre-commit.ci

# Conflicts: # tests/models/test_feature_extractor.py # tests/models/test_multi_task_segmentor.py # tests/models/test_nucleus_instance_segmentor.py # tests/models/test_patch_predictor.py # tests/models/test_semantic_segmentation.py # tiatoolbox/models/architecture/__init__.py

## Summary of Changes ### Major Additions - **Dask Integration:** - Added `dask` as a dependency and integrated Dask arrays and lazy computation throughout the engine and patch predictor code. - Added Dask-based merging, chunking, and memory-aware processing for large images and WSIs. - **Zarr Output Support:** - Added support for saving model predictions and intermediate results directly to Zarr format. - New CLI options and internal logic for Zarr output, including memory thresholding and chunked writes. - **SemanticSegmentor Engine:** - Added a new `SemanticSegmentor` engine with Dask/Zarr support and new test coverage (`test_semantic_segmentor.py`). - Added CLI entrypoint for `semantic_segmentor` and removed the old `semantic_segment` CLI. - **Enhanced CLI and Config:** - Added CLI options for memory threshold, unified worker options, and improved mask handling. - Updated YAML configs and sample data for new models and test images. - **Utilities and Validation:** - Added utility functions for minimal dtype casting, patch/stride validation, and improved error handling (e.g., `DimensionMismatchError`). - Improved annotation store conversion for Dask arrays and Zarr-backed outputs. - **Changes to `kwarg`** - Add `memory-threshold` - Unified `num-loader-workers` and `num-postproc-workers` into `num-workers` - Removed `cache_mode` as cache mode is automatically handled. --- ### Major Removals/Refactors - **Removed Old CLI and Redundant Code:** - Deleted the old `semantic_segment.py` CLI and replaced it with `semantic_segmentor.py`. - Removed legacy cache mode and patch prediction Zarr store tests. - **Refactored Model and Dataset APIs:** - Unified and simplified model inference APIs to always return arrays (not dicts) for batch outputs. - Refactored dataset classes to enforce patch shape validation and remove legacy “mode” logic. - **Test Cleanup:** - Removed or updated tests that relied on old APIs or cache mode. - Refactored test assertions for new output types and Dask array handling. - **API Consistency:** - Standardized function and argument names across engines, CLI, and utility modules. - Updated docstrings and type hints for clarity and consistency. --- ### Notable File Changes - **New:** - `tiatoolbox/cli/semantic_segmentor.py` - `tests/engines/test_semantic_segmentor.py` - **Removed:** - `tiatoolbox/cli/semantic_segment.py` - Old cache mode and patch Zarr store tests - **Heavily Modified:** - `engine_abc.py`, `patch_predictor.py`, `semantic_segmentor.py` - CLI modules and test suites - Dataset and utility modules for Dask/Zarr compatibility --- ### Impact - Enables scalable, parallel, and memory-efficient inference and output saving for large images. - Simplifies downstream analysis by supporting Zarr as a native output format. - Lays the groundwork for further Dask-based optimizations in TIAToolbox. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

## 🚀Summary This PR introduces a new **[GrandQC Tissue Detection Model](https://github.com/cpath-ukk/grandqc/tree/main)** for digital pathology quality control and integrates **EfficientNet-based encoder architecture** into the TIAToolbox framework. --- ## ✨Key Changes - **New Model Architecture** - Added `grandqc.py` implementing a UNet++ decoder with EfficientNet encoder for tissue segmentation. - Includes preprocessing (JPEG compression + ImageNet normalization), postprocessing (argmin-based mask generation), and batch inference utilities. - **EfficientNet Encoder** - Added `timm_efficientnet.py` providing configurable EfficientNet encoders with dilation support and custom input channels. - **Pretrained Model Config** - Updated `pretrained_model.yaml` to register `grandqc_tissue_detection_mpp10` with associated IO configuration. - Corrected `IOSegmentorConfig` references and adjusted resolutions for SCCNN models. - **Testing** - Added comprehensive unit tests for: - `GrandQCModel` functionality, preprocessing/postprocessing, and decoder blocks. - EfficientNet encoder utilities and scaling logic. ## Impact - Enables high-resolution tissue detection for WSI quality control using state-of-the-art architectures. - Improves flexibility for segmentation tasks with EfficientNet encoders. - Enhances code quality and consistency through updated linting and formatting tools. ## Tasks - [x] Re-host GrandQC model weights on TIA Hugging Face - [x] Update `pretrained_model.yaml` - [x] Update `requirements.txt` - [x] Define GrandQC model architecture - [x] Add example usage - [x] Remove segmentation-models-pytorch dependency - [x] Wait for response from GrandQC authors - [x] Add tests - [x] Tidy up --------- Co-authored-by: Shan E Ahmed Raza <13048456+shaneahmed@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

# 🚀 Summary This PR introduces a new **`DeepFeatureExtractor` engine** to the TIAToolbox framework, enabling extraction of intermediate CNN feature representations from whole slide images (WSIs) or image patches. These features can be used for downstream tasks such as clustering, visualization, or training other models. The update also includes: - A **command-line interface (CLI)** for the new engine. - Extended **CLI utilities** for flexible input/output configurations. - Comprehensive **unit tests** covering patch-based and WSI-based workflows, multi-GPU support, and CLI functionality. - Integration with TIAToolbox’s model registry and CLI ecosystem. --- ## ✨ Key Features ### **New Engine: `DeepFeatureExtractor`** - Extracts intermediate CNN features from WSIs or patches. - Outputs feature embeddings and spatial coordinates in **Zarr** or **dict** format. - Implements **memory-aware caching** for large-scale WSI processing. - Compatible with: - TIAToolbox pretrained models. - Torchvision CNN backbones (e.g., ResNet, DenseNet, MobileNet). - **All timm architectures via `timm.list_models()`**, including HuggingFace-hosted models. - Supports both **patch-mode** and **WSI-mode** workflows. ### **CLI Integration** - Adds `deep-feature-extractor` command to TIAToolbox CLI. - Supports options for: - Input/output paths and file types. - Model selection (`resnet18`, `efficientnet_b0`, timm-based backbones, etc.). - Patch extraction parameters (`patch_input_shape`, `stride_shape`, `input_resolutions`). - Batch size, device selection, memory threshold, overwrite behavior. - Flexible JSON-based CLI options for resolutions and class mappings. ### **Extended CLI Utilities** - New reusable options: - `--input-resolutions`, `--output-resolutions` (JSON list of dicts). - `--patch-input-shape`, `--stride-shape`, `--scale-factor`. - `--class-dict` for mapping class indices to names. - `--overwrite` and `--output-file` for fine-grained control. ### **Unit Tests** - **Engine Tests**: - Patch-based and WSI-based feature extraction. - Validation of Zarr outputs (features and coordinates). - Multi-GPU functionality. - **Model Compatibility**: - Tests with `CNNBackbone` and `TimmBackbone` models. - **CLI Tests**: - Single-file and parameterized runs. - Validation of JSON parsing for CLI options. ### **Codebase Integration** - Registers `DeepFeatureExtractor` in `tiatoolbox.models` and engine registry. - Adds CLI command in `tiatoolbox.cli.__init__.py`. - Updates architecture utilities to support timm-based backbones and HuggingFace models. - Introduces dictionaries for Torch and timm backbones (`torch_cnn_backbone_dict`, `timm_arch_dict`).

# 🚀 Summary This PR introduces a new **`NucleusDetector` engine** to the TIAToolbox framework, enabling detection of nuclei from whole slide images (WSIs) or image patches using models such as **`MapDe`** and **`SCCNN`**. It supersedes PR #538 by leveraging **`dask`** for efficient, parallelized post-processing and result merging. The update also includes: --- ## ✨ Key Features ### **New Engine: `NucleusDetector`** - Detects nuclei centroids and probabilities from WSIs or patches. - Produces a **detection map** aligned with segmentation dimensions. - Serializes detections into **detection arrays** (`[[x], [y], [type], [probs]]`). - Supports multiple output backends: - **SQLiteStore** (chunked storage for WSI/patch). - **Dictionary** (flat or patch-indexed). - **Zarr** (arrays for coordinates, classes, probabilities). - Compatible with nucleus detection models: - **MapDe** (implemented). - **SCCNN** (integration in progress/debugging). - Supports both **patch-mode** and **WSI-mode** workflows. ### Technical Implementation The detection pipeline operates as follows: 1. **Segmentation**: A WSI-level segmentation map (dask array) is generated using `SemanticSegmentor.infer_wsi()`. 2. **Parallel Post-processing**: For WSI inference. Use `dask.array.map_overlap` to apply the model's post-processing function across the entire segmentation map. This allows the function to execute in parallel on chunks, after which the results are automatically merged back into a unified **"detection_map"**, which is saved as zarr in a cache directory for further processing. 3. **Detection Map**: * Maintains the same dimensions as the segmentation map. * Nuclei centroids contain the detection probability values (defaults to `1` if the model does not produce probabilities). 4. **Serialization**: The "detection_map" is converted into **"detection_arrays"** (format: `[[x], [y], [type], [probs]]`) representing the detected nuclei. These records are then saved into `SQLiteStore` (chunk-by-chunk), `zarr`, or a `dict` (patch mode only). ### Output Formats #### SQLiteStore * **WSI Mode**: Returns a single `SQLiteStore`. * **Patch Mode**: Returns one `SQLiteStore` per patch. * **Format**: ```python Annotation(Point(x,y), properties={'type': 'nuclei', 'probs': 0.9}) ``` #### Dictionary * **WSI Mode**: ```python { 'x': [...], 'y': [...], 'classes': [...], 'probs': [...] } ``` * **Patch Mode** (One sub-dictionary per patch index): ```python { 0: { 'x': [...], 'y': [...], 'classes': [...], 'probs': [...] }, 1: { ... } } ``` #### Zarr * **WSI Mode**: ```python { 'x': [...], 'y': [...], 'classes': [...], 'probs': [...] } ``` * **Patch Mode**: Each key maps to a list of `da.array` objects, where each array corresponds to a patch. ```python { 'x': [[...], ...], 'y': [[...], ...], 'classes': [[...], ...], 'probs': [[...], ...] } ``` ### **Codebase Integration** - Registers `NucleusDetector` in `tiatoolbox.models` and engine registry. - Refactors detection logic from PR #538 into modular components. - Updates `MapDe` implementation to use the new engine. - Begins integration of `SCCNN` with `NucleusDetector`. - Adds utilities for serialization into SQLite, dict, and zarr formats. - Introduces unit tests for detection workflows. - Removes unused parameters `prediction_shape` and `prediction_dtype` from `post_process_patches()` and `post_process_wsi()` functions in all engines. - `post_process_patches()` and `post_process_wsi()` now takes `raw_predictions` instead of `raw_predictions["probabilities"]`. ### Tasks - [x] Port code from PR #538 to supersede and close it. - [x] Add `NucleusDetector` engine. - [x] Update existing detection models (`MapDe` implementation complete; `SCCNN` implementation in progress/debugging). - [x] Add unit tests. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Shan E Ahmed Raza <13048456+shaneahmed@users.noreply.github.com> Co-authored-by: Jiaqi Lv <jiaqilv@Jiaqis-MacBook-Pro.local>

## Summary This PR updates the patch‑prediction example to align with the new `PatchPredictor` engine and fixes a long‑standing issue in `EngineABC` related to model‑attribute retrieval when using `DataParallel`. --- ## What’s Changed ### 🔧 Example Notebook Updates - Updated **`examples/05-patch-prediction.ipynb`** to use the new `PatchPredictor` engine API. - Added a new **“Visualize in TIAViz”** section, allowing readers to directly inspect prediction results inside **TIAViz** for a smoother, more interactive workflow. ### 🐛 EngineABC Bug Fix - Fixed a bug in **`EngineABC`** where model attributes were incorrectly retrieved from a `DataParallel` wrapper. - Introduced `_get_model_attr()` to safely unwrap the underlying model when needed. - This resolves multi‑GPU crashes caused by attributes living on the wrapped module instead of the actual model. --- ## Why This Matters - Ensures the patch‑prediction example stays up‑to‑date with the latest engine design. - Improves multi‑GPU stability and prevents confusing attribute‑access errors. - Enhances the user experience by integrating TIAViz visualization directly into the example workflow. --- ## Testing - Verified that the updated notebook runs end‑to‑end with the new engine. - Confirmed that multi‑GPU training and inference no longer crash when accessing model attributes. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Shan E Ahmed Raza <13048456+shaneahmed@users.noreply.github.com>

review-notebook-app · 2026-01-09T15:53:35Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

# Conflicts: # tests/models/test_dataset.py # tests/models/test_patch_predictor.py # tiatoolbox/data/remote_samples.yaml

This pull request updates sample data URLs in some of the notebooks and all the tests to use the Hugging Face dataset repository instead of the previous TIA server. **Migration of sample data URLs to Hugging Face:** * `examples/02-stain-normalization.ipynb` * `examples/03-tissue-masking.ipynb` * `examples/04-patch-extraction.ipynb` * `tests/test_utils.py`, `tests/test_wsireader.py` * Updated example usage in the docstring of `tiatoolbox/utils/tiff_to_fsspec.py` to use the new Hugging Face sample WSI URL.

shaneahmed added 3 commits March 24, 2023 11:18

🔧 Use pyproject.toml for bdist_wheel configuration

ef55e95

- Use `pyproject.toml` for `bdist_wheel` configuration

Merge remote-tracking branch 'origin/develop' into dev-define-engines…

49a0624

…-abc

⚡ Improve Engines performance and implementation

8ba6def

- Improve `Engines` performance and implementation

shaneahmed self-assigned this Mar 31, 2023

shaneahmed added the enhancement New feature or request label Mar 31, 2023

Merge branch 'develop' into dev-define-engines-abc

5cbcfcf

shaneahmed added this to the Release v2.0.0 milestone Apr 10, 2023

shaneahmed mentioned this pull request Apr 19, 2023

🩹 Support for np.ndarray and WSIReader in PatchPredictor #576

Closed

♻️ Refactor engines_abc.py

fac1000

- Refactor engines_abc.py

shaneahmed changed the title ~~⚡ Improve Engines Performance and Implementation~~ ⚡ Improve Engine Performance and Implementation Apr 28, 2023

shaneahmed added 9 commits May 5, 2023 22:17

Merge branch 'develop' into dev-define-engines-abc

a72d9ba

Merge branch 'develop' into dev-define-engines-abc

57ea44a

Merge branch 'develop' into dev-define-engines-abc

6618161

Merge branch 'develop' into dev-define-engines-abc

6996764

Merge branch 'develop' into dev-define-engines-abc

3584f6c

Merge branch 'develop' into dev-define-engines-abc

eada692

Merge branch 'develop' into dev-define-engines-abc

77f1992

Merge branch 'develop' into dev-define-engines-abc

a477d32

Merge branch 'develop' into dev-define-engines-abc

f3e33b9

shaneahmed linked an issue Jul 14, 2023 that may be closed by this pull request

Shifted patches when merging patch predictions! #634

Open

shaneahmed mentioned this pull request Jul 14, 2023

Shifted patches when merging patch predictions! #634

Open

shaneahmed and others added 8 commits July 21, 2023 17:17

Merge branch 'develop' into dev-define-engines-abc

7d35285

Merge branch 'develop' into dev-define-engines-abc

7bad284

[pre-commit.ci] auto fixes from pre-commit.com hooks

36fd629

for more information, see https://pre-commit.ci

Merge branch 'develop' into dev-define-engines-abc

443141c

Merge branch 'develop' into dev-define-engines-abc

b9d8c38

[pre-commit.ci] auto fixes from pre-commit.com hooks

e608f7b

for more information, see https://pre-commit.ci

Merge branch 'develop' into dev-define-engines-abc

1d7f5c0

[pre-commit.ci] auto fixes from pre-commit.com hooks

b956bf5

for more information, see https://pre-commit.ci

shaneahmed and others added 23 commits October 2, 2025 10:29

Merge branch 'develop' into dev-define-engines-abc

f36abe4

Merge branch 'develop' into dev-define-engines-abc

72d3474

# Conflicts: # tests/models/test_patch_predictor.py

🐛 Fix Use a raw string or re.escape() to make the intention explicit

ca49b18

🔀 Merge develop into dev-engine-abc

e3520ba

[pre-commit.ci] auto fixes from pre-commit.com hooks

efdbf4f

for more information, see https://pre-commit.ci

🐛 Fix ruff error

2f1ca4a

[pre-commit.ci] auto fixes from pre-commit.com hooks

cf36794

for more information, see https://pre-commit.ci

🔥 Remove redundant import

050986f

✅ Update tests to use track_tmp_path for clean up

f5a4c35

Merge branch 'develop' into dev-define-engines-abc

31b7995

Merge branch 'develop' into dev-define-engines-abc

67ef0da

Merge branch 'develop' into dev-define-engines-abc

c535eab

Merge branch 'develop' into dev-define-engines-abc

979390c

Merge branch 'develop' into dev-define-engines-abc

b5ba794

Merge branch 'develop' into dev-define-engines-abc

400e922

Merge branch 'develop' into dev-define-engines-abc

39ae9cb

Merge branch 'develop' into dev-define-engines-abc

d40dc64

shaneahmed and others added 6 commits January 10, 2026 11:13

Merge branch 'develop' into dev-define-engines-abc

232f7bd

Merge branch 'develop' into dev-define-engines-abc

29395cb

📌 Update dask to 2025.12.0 and Above

014ea6f

Merge branch 'develop' into dev-define-engines-abc

163c3b0

# Conflicts: # tests/models/test_dataset.py # tests/models/test_patch_predictor.py # tiatoolbox/data/remote_samples.yaml

💄 Update URL for model.

0608986

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡ Improve `Engine` Performance and Implementation #578

⚡ Improve `Engine` Performance and Implementation #578

Uh oh!

shaneahmed commented Mar 31, 2023 •

edited

Loading

Uh oh!

codecov bot commented Mar 31, 2023 •

edited

Loading

Uh oh!

review-notebook-app bot commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

⚡ Improve Engine Performance and Implementation #578

Are you sure you want to change the base?

⚡ Improve Engine Performance and Implementation #578

Uh oh!

Conversation

shaneahmed commented Mar 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

review-notebook-app bot commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

⚡ Improve `Engine` Performance and Implementation #578

⚡ Improve `Engine` Performance and Implementation #578

shaneahmed commented Mar 31, 2023 •

edited

Loading

codecov bot commented Mar 31, 2023 •

edited

Loading