Skip to content

Conversation

@shaneahmed
Copy link
Member

@shaneahmed shaneahmed commented Mar 31, 2023

  • Improve Engines performance and implementation
  • Redesigns PatchPredictor engine using the new EngineABC base class.
  • The WSIs are now processed using the same code as for the processing the patches using WSI based dataloader.
  • The intermediate output is saved as zarr for the WSIs to resolve memory issues.
  • The output of model architectures should now be a dictionary.
  • The output can be specified as AnnotationStore for visualisation using TIAViz.
  • Fix mypy Type Checks for cli/common.py
  • Redesigns PatchPredictor engine using the new EngineABC base class.
  • The WSIs are now processed using the same code as for the processing the patches using WSI based dataloader.
  • The intermediate output is saved as zarr for the WSIs to resolve memory issues.
  • The output of model architectures should now be a dictionary.
  • The output can be specified as AnnotationStore for visualisation using TIAViz.
  • Add PatchPredictor Engine based on EngineABC
  • Add return_probabilities option to Params
  • Removes merge_predictions option in PatchPredictor engine.
  • Defines post_process_cache_mode which allows running the algorithm on WSI
  • Add infer_wsi for WSI inference
  • Removes save_wsi_output as this is not required after post processing.
  • Removes merge_predictions and fixes docstring in EngineABCRunParams
  • compile_model is now moved to EngineABC init
  • Fixes bug with _calculate_scale_factor
  • Fixes a bug in class_dict definition.
  • _get_zarr_array is now a public function get_zarr_array in misc
  • patch_predictions_as_annotations runs the loop on patch_coords instead of class_probs

@shaneahmed shaneahmed self-assigned this Mar 31, 2023
@shaneahmed shaneahmed added the enhancement New feature or request label Mar 31, 2023
@codecov
Copy link

codecov bot commented Mar 31, 2023

Codecov Report

❌ Patch coverage is 94.72914% with 72 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.30%. Comparing base (f38e809) to head (a20ec2b).

Files with missing lines Patch % Lines
tiatoolbox/models/dataset/dataset_abc.py 73.97% 38 Missing ⚠️
tiatoolbox/models/engine/io_config.py 56.75% 32 Missing ⚠️
tiatoolbox/cli/nucleus_instance_segment.py 66.66% 1 Missing ⚠️
...iatoolbox/models/architecture/timm_efficientnet.py 99.19% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #578      +/-   ##
===========================================
- Coverage    99.37%   95.30%   -4.08%     
===========================================
  Files           71       79       +8     
  Lines         9175     9954     +779     
  Branches      1197     1280      +83     
===========================================
+ Hits          9118     9487     +369     
- Misses          31      431     +400     
- Partials        26       36      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Refactor engines_abc.py
@shaneahmed shaneahmed changed the title ⚡ Improve Engines Performance and Implementation ⚡ Improve Engine Performance and Implementation Apr 28, 2023
shaneahmed and others added 23 commits October 2, 2025 10:29
# Conflicts:
#	tests/models/test_patch_predictor.py
# Conflicts:
#	tests/models/test_feature_extractor.py
#	tests/models/test_multi_task_segmentor.py
#	tests/models/test_nucleus_instance_segmentor.py
#	tests/models/test_patch_predictor.py
#	tests/models/test_semantic_segmentation.py
#	tiatoolbox/models/architecture/__init__.py
## Summary of Changes

### Major Additions
- **Dask Integration:**  
  - Added `dask` as a dependency and integrated Dask arrays and lazy computation throughout the engine and patch predictor code.
  - Added Dask-based merging, chunking, and memory-aware processing for large images and WSIs.

- **Zarr Output Support:**  
  - Added support for saving model predictions and intermediate results directly to Zarr format.
  - New CLI options and internal logic for Zarr output, including memory thresholding and chunked writes.

- **SemanticSegmentor Engine:**  
  - Added a new `SemanticSegmentor` engine with Dask/Zarr support and new test coverage (`test_semantic_segmentor.py`).
  - Added CLI entrypoint for `semantic_segmentor` and removed the old `semantic_segment` CLI.

- **Enhanced CLI and Config:**  
  - Added CLI options for memory threshold, unified worker options, and improved mask handling.
  - Updated YAML configs and sample data for new models and test images.

- **Utilities and Validation:**  
  - Added utility functions for minimal dtype casting, patch/stride validation, and improved error handling (e.g., `DimensionMismatchError`).
  - Improved annotation store conversion for Dask arrays and Zarr-backed outputs.

- **Changes to `kwarg`**
  - Add `memory-threshold`
  - Unified `num-loader-workers` and `num-postproc-workers` into `num-workers`
  - Removed `cache_mode` as cache mode is automatically handled.

---

### Major Removals/Refactors
- **Removed Old CLI and Redundant Code:**  
  - Deleted the old `semantic_segment.py` CLI and replaced it with `semantic_segmentor.py`.
  - Removed legacy cache mode and patch prediction Zarr store tests.

- **Refactored Model and Dataset APIs:**  
  - Unified and simplified model inference APIs to always return arrays (not dicts) for batch outputs.
  - Refactored dataset classes to enforce patch shape validation and remove legacy “mode” logic.

- **Test Cleanup:**  
  - Removed or updated tests that relied on old APIs or cache mode.
  - Refactored test assertions for new output types and Dask array handling.

- **API Consistency:**  
  - Standardized function and argument names across engines, CLI, and utility modules.
  - Updated docstrings and type hints for clarity and consistency.

---

### Notable File Changes
- **New:**  
  - `tiatoolbox/cli/semantic_segmentor.py`
  - `tests/engines/test_semantic_segmentor.py`

- **Removed:**  
  - `tiatoolbox/cli/semantic_segment.py`
  - Old cache mode and patch Zarr store tests

- **Heavily Modified:**  
  - `engine_abc.py`, `patch_predictor.py`, `semantic_segmentor.py`
  - CLI modules and test suites
  - Dataset and utility modules for Dask/Zarr compatibility

---

### Impact

- Enables scalable, parallel, and memory-efficient inference and output saving for large images.
- Simplifies downstream analysis by supporting Zarr as a native output format.
- Lays the groundwork for further Dask-based optimizations in TIAToolbox.


---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
## 🚀Summary
This PR introduces a new **[GrandQC Tissue Detection Model](https://github.com/cpath-ukk/grandqc/tree/main)** for digital pathology quality control and integrates **EfficientNet-based encoder architecture** into the TIAToolbox framework.

---

## ✨Key Changes
- **New Model Architecture**
  - Added `grandqc.py` implementing a UNet++ decoder with EfficientNet encoder for tissue segmentation.
  - Includes preprocessing (JPEG compression + ImageNet normalization), postprocessing (argmin-based mask generation), and batch inference utilities.
- **EfficientNet Encoder**
  - Added `timm_efficientnet.py` providing configurable EfficientNet encoders with dilation support and custom input channels.
- **Pretrained Model Config**
  - Updated `pretrained_model.yaml` to register `grandqc_tissue_detection_mpp10` with associated IO configuration.
  - Corrected `IOSegmentorConfig` references and adjusted resolutions for SCCNN models.
- **Testing**
  - Added comprehensive unit tests for:
    - `GrandQCModel` functionality, preprocessing/postprocessing, and decoder blocks.
    - EfficientNet encoder utilities and scaling logic.
  
## Impact
- Enables high-resolution tissue detection for WSI quality control using state-of-the-art architectures.
- Improves flexibility for segmentation tasks with EfficientNet encoders.
- Enhances code quality and consistency through updated linting and formatting tools.


## Tasks
- [x] Re-host GrandQC model weights on TIA Hugging Face
- [x] Update `pretrained_model.yaml`
- [x] Update `requirements.txt`
- [x] Define GrandQC model architecture
- [x] Add example usage
- [x] Remove segmentation-models-pytorch dependency
- [x] Wait for response from GrandQC authors
- [x] Add tests
- [x] Tidy up

---------

Co-authored-by: Shan E Ahmed Raza <13048456+shaneahmed@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
# 🚀 Summary

This PR introduces a new **`DeepFeatureExtractor` engine** to the TIAToolbox framework, enabling extraction of intermediate CNN feature representations from whole slide images (WSIs) or image patches. These features can be used for downstream tasks such as clustering, visualization, or training other models. The update also includes:

- A **command-line interface (CLI)** for the new engine.
- Extended **CLI utilities** for flexible input/output configurations.
- Comprehensive **unit tests** covering patch-based and WSI-based workflows, multi-GPU support, and CLI functionality.
- Integration with TIAToolbox’s model registry and CLI ecosystem.

---

## ✨ Key Features

### **New Engine: `DeepFeatureExtractor`**
- Extracts intermediate CNN features from WSIs or patches.
- Outputs feature embeddings and spatial coordinates in **Zarr** or **dict** format.
- Implements **memory-aware caching** for large-scale WSI processing.
- Compatible with:
  - TIAToolbox pretrained models.
  - Torchvision CNN backbones (e.g., ResNet, DenseNet, MobileNet).
  - **All timm architectures via `timm.list_models()`**, including HuggingFace-hosted models.
- Supports both **patch-mode** and **WSI-mode** workflows.

### **CLI Integration**
- Adds `deep-feature-extractor` command to TIAToolbox CLI.
- Supports options for:
  - Input/output paths and file types.
  - Model selection (`resnet18`, `efficientnet_b0`, timm-based backbones, etc.).
  - Patch extraction parameters (`patch_input_shape`, `stride_shape`, `input_resolutions`).
  - Batch size, device selection, memory threshold, overwrite behavior.
- Flexible JSON-based CLI options for resolutions and class mappings.

### **Extended CLI Utilities**
- New reusable options:
  - `--input-resolutions`, `--output-resolutions` (JSON list of dicts).
  - `--patch-input-shape`, `--stride-shape`, `--scale-factor`.
  - `--class-dict` for mapping class indices to names.
  - `--overwrite` and `--output-file` for fine-grained control.

### **Unit Tests**
- **Engine Tests**:
  - Patch-based and WSI-based feature extraction.
  - Validation of Zarr outputs (features and coordinates).
  - Multi-GPU functionality.
- **Model Compatibility**:
  - Tests with `CNNBackbone` and `TimmBackbone` models.
- **CLI Tests**:
  - Single-file and parameterized runs.
  - Validation of JSON parsing for CLI options.

### **Codebase Integration**
- Registers `DeepFeatureExtractor` in `tiatoolbox.models` and engine registry.
- Adds CLI command in `tiatoolbox.cli.__init__.py`.
- Updates architecture utilities to support timm-based backbones and HuggingFace models.
- Introduces dictionaries for Torch and timm backbones (`torch_cnn_backbone_dict`, `timm_arch_dict`).
# 🚀 Summary

This PR introduces a new **`NucleusDetector` engine** to the TIAToolbox framework, enabling detection of nuclei from whole slide images (WSIs) or image patches using models such as **`MapDe`** and **`SCCNN`**. It supersedes PR #538 by leveraging **`dask`** for efficient, parallelized post-processing and result merging. The update also includes:

---

## ✨ Key Features

### **New Engine: `NucleusDetector`**
- Detects nuclei centroids and probabilities from WSIs or patches.  
- Produces a **detection map** aligned with segmentation dimensions.  
- Serializes detections into **detection arrays** (`[[x], [y], [type], [probs]]`).  
- Supports multiple output backends:  
  - **SQLiteStore** (chunked storage for WSI/patch).  
  - **Dictionary** (flat or patch-indexed).  
  - **Zarr** (arrays for coordinates, classes, probabilities).  
- Compatible with nucleus detection models:  
  - **MapDe** (implemented).  
  - **SCCNN** (integration in progress/debugging).  
- Supports both **patch-mode** and **WSI-mode** workflows.  

### Technical Implementation
The detection pipeline operates as follows:
1.  **Segmentation**: A WSI-level segmentation map (dask array) is generated using `SemanticSegmentor.infer_wsi()`.
2.  **Parallel Post-processing**: For WSI inference. Use `dask.array.map_overlap` to apply the model's post-processing function across the entire segmentation map. This allows the function to execute in parallel on chunks, after which the results are automatically merged back into a unified **"detection_map"**, which is saved as zarr in a cache directory for further processing.
3.  **Detection Map**:
    * Maintains the same dimensions as the segmentation map.
    * Nuclei centroids contain the detection probability values (defaults to `1` if the model does not produce probabilities).
4.  **Serialization**: The "detection_map" is converted into **"detection_arrays"** (format: `[[x], [y], [type], [probs]]`) representing the detected nuclei. These records are then saved into `SQLiteStore` (chunk-by-chunk), `zarr`, or a `dict` (patch mode only).

### Output Formats

#### SQLiteStore
* **WSI Mode**: Returns a single `SQLiteStore`.
* **Patch Mode**: Returns one `SQLiteStore` per patch.
* **Format**:
    ```python
    Annotation(Point(x,y), properties={'type': 'nuclei', 'probs': 0.9})
    ```

#### Dictionary
* **WSI Mode**:
    ```python
    {
        'x': [...],
        'y': [...],
        'classes': [...],
        'probs': [...]
    }
    ```
* **Patch Mode** (One sub-dictionary per patch index):
    ```python
    {
        0: {
            'x': [...],
            'y': [...],
            'classes': [...],
            'probs': [...]
        },
        1: { ... }
    }
    ```

#### Zarr
* **WSI Mode**:
    ```python
    {
        'x': [...],
        'y': [...],
        'classes': [...],
        'probs': [...]
    }
    ```
* **Patch Mode**: Each key maps to a list of `da.array` objects, where each array corresponds to a patch.
    ```python
    {
        'x': [[...], ...],
        'y': [[...], ...],
        'classes': [[...], ...],
        'probs': [[...], ...]
    }
    ```
### **Codebase Integration**
- Registers `NucleusDetector` in `tiatoolbox.models` and engine registry.  
- Refactors detection logic from PR #538 into modular components.  
- Updates `MapDe` implementation to use the new engine.  
- Begins integration of `SCCNN` with `NucleusDetector`.  
- Adds utilities for serialization into SQLite, dict, and zarr formats.  
- Introduces unit tests for detection workflows.  
- Removes unused parameters `prediction_shape` and `prediction_dtype` from `post_process_patches()` and `post_process_wsi()` functions in all engines.
- `post_process_patches()` and `post_process_wsi()` now takes `raw_predictions` instead of `raw_predictions["probabilities"]`.

### Tasks
- [x] Port code from PR #538 to supersede and close it.
- [x] Add `NucleusDetector` engine.
- [x] Update existing detection models (`MapDe` implementation complete; `SCCNN` implementation in progress/debugging).
- [x] Add unit tests.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Shan E Ahmed Raza <13048456+shaneahmed@users.noreply.github.com>
Co-authored-by: Jiaqi Lv <jiaqilv@Jiaqis-MacBook-Pro.local>
## Summary

This PR updates the patch‑prediction example to align with the new `PatchPredictor` engine and fixes a long‑standing issue in `EngineABC` related to model‑attribute retrieval when using `DataParallel`.

---

## What’s Changed

### 🔧 Example Notebook Updates
- Updated **`examples/05-patch-prediction.ipynb`** to use the new `PatchPredictor` engine API.
- Added a new **“Visualize in TIAViz”** section, allowing readers to directly inspect prediction results inside **TIAViz** for a smoother, more interactive workflow.

### 🐛 EngineABC Bug Fix
- Fixed a bug in **`EngineABC`** where model attributes were incorrectly retrieved from a `DataParallel` wrapper.
- Introduced `_get_model_attr()` to safely unwrap the underlying model when needed.
- This resolves multi‑GPU crashes caused by attributes living on the wrapped module instead of the actual model.

---

## Why This Matters
- Ensures the patch‑prediction example stays up‑to‑date with the latest engine design.
- Improves multi‑GPU stability and prevents confusing attribute‑access errors.
- Enhances the user experience by integrating TIAViz visualization directly into the example workflow.

---

## Testing
- Verified that the updated notebook runs end‑to‑end with the new engine.
- Confirmed that multi‑GPU training and inference no longer crash when accessing model attributes.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Shan E Ahmed Raza <13048456+shaneahmed@users.noreply.github.com>
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

shaneahmed and others added 6 commits January 10, 2026 11:13
# Conflicts:
#	tests/models/test_dataset.py
#	tests/models/test_patch_predictor.py
#	tiatoolbox/data/remote_samples.yaml
This pull request updates sample data URLs in some of the notebooks and all the tests to use the Hugging Face dataset repository instead of the previous TIA server. 

**Migration of sample data URLs to Hugging Face:**

* `examples/02-stain-normalization.ipynb` 
*  `examples/03-tissue-masking.ipynb` 
* `examples/04-patch-extraction.ipynb` 
* `tests/test_utils.py`, `tests/test_wsireader.py`
* Updated example usage in the docstring of `tiatoolbox/utils/tiff_to_fsspec.py` to use the new Hugging Face sample WSI URL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment