A comprehensive automated pipeline for preprocessing stroke MRI data, designed to convert DICOM images to NIfTI format, perform image registration, skull stripping, and organize files for machine learning and analysis workflows.
- Purpose
- Key Features
- Environment Setup
- Input Data Format
- Usage
- Pipeline Steps
- Output Format
- Error Handling
- Troubleshooting
This pipeline is specifically designed for preprocessing stroke MRI datasets, particularly for machine learning applications such as brain lesion segmentation. It addresses common challenges in medical image processing:
- Multi-modal integration: Handles DWI, ADC, and FLAIR sequences with intelligent fallback strategies
- Data standardization: Converts heterogeneous DICOM data to standardized NIfTI format
- Spatial alignment: Registers FLAIR images to DWI space for consistent multi-modal analysis
- Preprocessing automation: Performs skull stripping and prepares data for downstream analysis
- Robust handling: Manages missing modalities, corrupted files, and edge cases gracefully
- DICOM Decompression: Automated decompression of compressed DICOM files
- Multi-modal Conversion: Converts DWI, ADC, and FLAIR sequences to NIfTI format
- Intelligent Modality Handling: Creates zero-filled NIfTI files for missing modalities
- Image Registration: FLAIR-to-DWI registration using ANTs with fallback strategies
- Skull Stripping: HD-BET-based brain extraction with GPU acceleration
- File Organization: Standardized naming and directory structure
- Edge Case Management: Handles zero-valued images, missing modalities, and corrupted data
- Selective Execution: Run only specific pipeline steps (e.g., conversion only, registration only)
- Parallel Processing: Single-worker mode (multi-worker support under development)
- Comprehensive Logging: Detailed logging with progress tracking and error reporting
- Safe File Operations: Conflict detection and resolution during file operations
- Complete: All three modalities (DWI, ADC, FLAIR) present
- DWI Only: Creates zero-filled ADC and FLAIR
- FLAIR Only: Creates zero-filled DWI and ADC
- Empty DWI: Uses ADC and FLAIR, creates zero-filled DWI
- Intelligent Registration: Adapts registration strategy based on available modalities
- Operating System: Linux (recommended), macOS, or Windows
- Python Version: 3.10
- Memory: 8GB minimum, 16GB recommended for large datasets
- Storage: 2-3x input dataset size for intermediate files
- GPU: Optional but recommended for HD-BET acceleration
The pipeline requires three main tools. Please refer to their official documentation for detailed installation instructions:
-
dcm2niix - DICOM to NIfTI converter
- Official repository: https://github.com/rordenlab/dcm2niix
- Installation via conda:
conda install -c conda-forge dcm2niix
-
ANTs - Advanced Normalization Tools for image registration
- Official repository: https://github.com/ANTsX/ANTs
- Installation via conda:
conda install -c conda-forge ants
-
HD-BET - Brain extraction tool
- Official repository: https://github.com/MIC-DKFZ/HD-BET
- Installation via pip:
pip install hd-bet
-
Create Virtual Environment:
conda create -n brain_mri_env python=3.10 conda activate brain_mri_env
-
Install Dependencies:
# Install dcm2niix (refer to official docs for latest version) conda install -c conda-forge dcm2niix # Install ANTs (refer to official docs for latest version) conda install -c conda-forge ants # Install HD-BET (refer to official docs for latest version) pip install hd-bet # Install additional requirements pip install nibabel numpy pathlib2 tqdm
-
Verify Installation:
dcm2niix -h antsRegistration --help hd-bet -h
The pipeline expects input data organized in a hierarchical directory structure:
input_directory/
├── Patient_ID_1/
│ ├── Case_ID_1/
│ │ ├── DWI/ # DWI DICOM files
│ │ ├── ADC/ # ADC DICOM files
│ │ └── FLAIR/ # FLAIR DICOM files
│ └── Case_ID_2/
│ ├── DWI/
│ ├── ADC/
│ └── FLAIR/
└── Patient_ID_2/
└── Case_ID_3/
├── DWI/
├── ADC/
└── FLAIR/
- DWI: Diffusion-weighted imaging sequences
- ADC: Apparent diffusion coefficient maps
- FLAIR: Fluid-attenuated inversion recovery sequences
The pipeline intelligently handles various modality combinations:
- ✅ Complete cases: All three modalities present
- ✅ DWI-only cases: Only DWI sequences available
- ✅ FLAIR-only cases: Only FLAIR sequences available
- ✅ Missing DWI: ADC and FLAIR available, DWI missing/corrupted
- ✅ Zero-valued images: Handles all-zero images in registration
- Input: DICOM files (.dcm, .IMA, or any DICOM format)
- Intermediate: NIfTI compressed format (.nii.gz)
- Output: Organized NIfTI files with standardized naming
-
Activate Environment:
conda activate brain_mri_env
-
Run Complete Pipeline:
python -m brain_mri_preprocess_pipeline.medical_image_pipeline \ --input /path/to/input/data \ --output /path/to/output \ --logs /path/to/logs \ --workers 1
⚠️ Important Note: Currently, only single-worker mode (--workers 1) is recommended due to concurrency issues in the skull stripping tool (HD-BET). Multi-worker support is under optimization.
Run Only Conversion Step:
python -m brain_mri_preprocess_pipeline.medical_image_pipeline \
--input /path/to/input \
--output /path/to/output \
--logs /path/to/logs \
--only-steps conversionRun Only Registration Step:
python -m brain_mri_preprocess_pipeline.medical_image_pipeline \
--input /path/to/input \
--output /path/to/output \
--logs /path/to/logs \
--only-steps registrationRun Skull Stripping and Subsequent Steps:
python -m brain_mri_preprocess_pipeline.medical_image_pipeline \
--input /path/to/input \
--output /path/to/output \
--logs /path/to/logs \
--only-steps skull_stripping,organizationRun Multiple Specific Steps:
python -m brain_mri_preprocess_pipeline.medical_image_pipeline \
--input /path/to/input \
--output /path/to/output \
--logs /path/to/logs \
--only-steps conversion,registration,skull_strippingusage: medical_image_pipeline.py [-h] --input INPUT [--output OUTPUT]
[--logs LOGS] [--workers WORKERS]
[--only-steps ONLY_STEPS] [--skip-steps SKIP_STEPS]
Arguments:
--input INPUT, -i INPUT
Input directory containing DICOM files (required)
--output OUTPUT, -o OUTPUT
Output directory (default: output)
--logs LOGS, -l LOGS
Log directory (default: logs)
--workers WORKERS, -w WORKERS
Number of parallel workers (default: 1, recommended: 1)
--only-steps ONLY_STEPS
Run only specified steps (comma-separated)
Available: decompression,conversion,registration,skull_stripping,organization
--skip-steps SKIP_STEPS
Skip specified steps (comma-separated)
The pipeline consists of six main processing steps:
- Decompresses compressed DICOM files
- Preserves directory structure
- Continues processing if some files fail
- Converts DICOM files to NIfTI format using dcm2niix
- Handles modality-specific naming conventions
- Creates zero-filled NIfTI files for missing modalities
- Assigns standardized channel suffixes:
0000: DWI (Diffusion-weighted imaging)0001: ADC (Apparent diffusion coefficient)0002: FLAIR (Fluid-attenuated inversion recovery)
- Registers FLAIR images to DWI space using ANTs
- Intelligent strategy selection based on available modalities:
- Normal: DWI → FLAIR registration
- Zero DWI: ADC → FLAIR registration
- Zero FLAIR: Skip registration (already aligned)
- All zero: Skip registration with warning
- Brain extraction using HD-BET
- GPU acceleration when available
- Processes all modalities consistently
- Organizes files into standardized directory structure
- Applies consistent naming convention
- Generates processing statistics
output_directory/
├── nifti/ # Converted NIfTI files
│ └── Case_ID/
│ ├── ISLES2022_CaseID_0000.nii.gz # DWI
│ ├── ISLES2022_CaseID_0001.nii.gz # ADC
│ └── ISLES2022_CaseID_0002.nii.gz # FLAIR
├── skull_stripped/ # Brain-extracted files
│ └── Case_ID/
│ ├── ISLES2022_CaseID_0000_skull_stripped.nii.gz
│ ├── ISLES2022_CaseID_0001_skull_stripped.nii.gz
│ └── ISLES2022_CaseID_0002_skull_stripped.nii.gz
└── organized/ # Final organized files
└── Dataset002_ISLES2022_all/
└── imagesTs/
├── ISLES2022_CaseID_0000.nii.gz
├── ISLES2022_CaseID_0001.nii.gz
└── ISLES2022_CaseID_0002.nii.gz
- Pattern:
ISLES2022_{CaseID}_{ChannelSuffix}.nii.gz - Channel Suffixes:
0000: DWI modality0001: ADC modality0002: FLAIR modality
- Graceful degradation: Continues processing when individual cases fail
- Comprehensive logging: Detailed error reporting with context
- Fallback strategies: Alternative processing methods for edge cases
- Safe file operations: Prevents data loss during file conflicts
- Missing modalities: Creates zero-filled substitutes
- Corrupted DICOM files: Skips corrupted files, processes remainder
- Registration failures: Falls back to alternative templates
- File conflicts: Safely resolves naming conflicts
Missing Dependencies:
# Verify installations
which dcm2niix antsRegistration hd-betPython Package Issues:
pip install --upgrade nibabel numpy pathlib2 tqdmDICOM Conversion Failures:
- Check DICOM file integrity
- Verify modality directory structure
- Review conversion logs for specific errors
Registration Failures:
- Verify NIfTI files contain valid image data
- Check for all-zero images in logs
- Ensure sufficient disk space for temporary files
Skull Stripping Issues:
- Verify HD-BET installation:
hd-bet -h - Check GPU availability if using GPU acceleration
- Always use
--workers 1(see parallel processing limitations below)
Parallel Processing Limitations:
- Current Limitation: Multi-worker processing (
--workers > 1) is not supported - Reason: The skull stripping tool (HD-BET) has concurrency issues that can cause random failures during parallel execution
- Recommendation: Always use
--workers 1for stable and reliable processing - Future: Multi-worker support is under development and optimization
Detailed logs are generated for each step:
- Main log: Overall pipeline progress and summary
- Step-specific logs: Detailed information for each processing step
- Error logs: Specific error messages and stack traces
Example log locations:
logs/
├── pipeline_YYYYMMDD_HHMMSS.log # Main pipeline log
├── conversion_YYYYMMDD_HHMMSS.log # Conversion step log
├── registration_YYYYMMDD_HHMMSS.log # Registration step log
└── skull_stripping_YYYYMMDD_HHMMSS.log # Skull stripping log
- Single worker mode: Required - Currently the only supported mode (
--workers 1) - Parallel processing: Not supported due to HD-BET concurrency limitations
- Sufficient disk space: Ensure 2-3x input size available
- Memory management: Monitor RAM usage during processing
- GPU utilization: Enable GPU for HD-BET if available for faster skull stripping