Skip to content

Eurekashen/R2Seg

Repository files navigation

R2-Seg: Training-Free OOD Medical Tumor Segmentation via Anatomical Reasoning and Statistical Rejection

Shuaike Shen1*, Ke Liu3*, Jiaqing Xie4, Shangde Gao3,
Chunhua Shen3, Ge Liu5, Mireia Crispin-Ortuzar2, Shangqi Gao2†

1 Carnegie Mellon University  ·  2 University of Cambridge  ·  3 Zhejiang University
4 ETH Zurich  ·  5 University of Illinois Urbana-Champaign
* Equal contribution  ·  Corresponding authors
shuaikes@andrew.cmu.edu · kliu@zju.edu.cn · sg2162@cam.ac.uk

Paper Website Built on BiomedParse CVPR 2026 Oral

Foundation models for medical image segmentation struggle under out-of-distribution (OOD) shifts, often producing fragmented false positives on OOD tumors. We introduce R2-Seg, a training-free framework for robust OOD tumor segmentation that operates via a two-stage Reason-and-Reject process. First, the Reason step employs an LLM-guided anatomical reasoning planner to localize organ anchors and generate multi-scale ROIs. Second, the Reject step applies two-sample statistical testing to candidates generated by a frozen foundation model (BiomedParse) within these ROIs. This statistical rejection filter retains only candidates significantly different from normal tissue, effectively suppressing false positives. Our framework requires no parameter updates, making it compatible with zero-update test-time augmentation and avoiding catastrophic forgetting. On multi-center and multi-modal tumor segmentation benchmarks, R2-Seg substantially improves Dice, specificity, and sensitivity over strong baselines and the original foundation models.

Experiment Visualization

✨ Highlights

  • OOD tumor segmentation: Robust segmentation of out-of-distribution tumors without any model fine-tuning.

TTA ID OOD

  • LLM reasoning for anatomy: tta_pipeline.llm_inference queries an OpenAI-compatible endpoint to propose anchor organs, ROI instructions, and auditing rationale.

  • Statistical screening: Connected components are tested with GPU-backed MMD permutation tests plus Benjamini–Hochberg FDR control to down-rank noisy candidates.

  • False-positive Gating: Hierarchical gating to filter false-postive segmentation results.

R2Seg Overview

🗂 Repository Layout

Module Responsibility
tta_pipeline/config.py Dataclasses for the planner, BiomedParse wrapper, ROI builder, stats stack, and mitigation thresholds.
tta_pipeline/llm_inference.py AnatomyPlanner client that requests ROI plans from an OpenAI-compatible endpoint or fallback templates and parses JSON responses safely.
tta_pipeline/biomedparse_inference.py Lazily loads BiomedParse, runs prompts over the full image and ROI crops, and fuses identity/flip predictions (tta_aggregation supports max/mean/median).
tta_pipeline/roi_planning.py Converts anchor masks into padded, jittered square ROIs and tracks transforms for scattering predictions back to full resolution.
tta_pipeline/postprocessing.py Extracts connected components, merges ROI/full-frame masks, and prepares candidate descriptors.
tta_pipeline/statistical_test.py Implements GPU-friendly two-sample MMD permutation tests plus Benjamini–Hochberg FDR correction to score each candidate.
tta_pipeline/prompt_normalisation.py Optional component for mapping free-form prompts to the BiomedParse vocabulary using either the LLM or heuristics.
tta_pipeline/pipeline.py The high-level orchestration (PromptedSegmentationPipeline.run) that stitches the planner, segmenter, ROI planner, and statistics together.
tta_pipeline/utils/image_io.py Minimal PNG reading/writing helpers used by the CLI entry points.

⚙️ Environment Setup

  1. Clone

    git clone https://github.com/Eurekashen/R2Seg.git
    cd R2Seg
  2. Create a Python environment

    • Option A (recommended):
      conda env create -f environment.yml
      conda activate biomedparse
    • Option B (manual, see INSTALLATION.md for more context):
      conda create -n biomedparse python=3.9.19
      conda activate biomedparse
      # Install PyTorch that matches your CUDA stack
      conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
      # Install Python dependencies
      pip install -r assets/requirements/requirements.txt
  3. LLM credentials
    The planner and prompt normaliser expect an OpenAI-compatible key:

    export OPENAI_API_KEY=sk-...
    # Optional: point to an alternative endpoint
    export OPENAI_BASE_URL=https://your-endpoint/v1

    If no key is provided the pipeline falls back to deterministic templates defined in tta_pipeline.llm_inference.DEFAULT_PLANS.

🚀 Running the R2-Seg Pipeline

Single-image inference

python run_tta_pipeline.py \
  runtime.image=/path/to/slice.png \
  runtime.concept="bladder tumor" \
  runtime.modality=MRI \
  runtime.pixel_spacing=[1.0,1.0] \
  runtime.output_dir=outputs/tta_example \
  runtime.enable_tta=true \
  runtime.save_intermediates=false \
  pipeline.llm.model=qwen3-0.6b

Key runtime knobs (all override-able at launch time):

  • runtime.image: path to a PNG slice (see tta_pipeline/utils/image_io.py for accepted formats).
  • runtime.concept / runtime.modality: free-form text; concepts are normalised against inference_utils/target_dist.json.
  • runtime.pixel_spacing: [row_mm, col_mm] used to convert ROI padding in millimetres.
  • runtime.enable_fp_mitigation=true: toggles additional morphology/probability gates defined in pipeline.mitigation.
  • runtime.save_intermediates=true: persists ROIs, probability maps, overlays, LLM plans, and candidate statistics under pipeline_outputs/<concept>/.

Batch inference

batch_infer.py mirrors the runtime arguments from run_tta_pipeline.py but walks an entire directory (optionally recursively) and reports progress via tqdm:

python batch_infer.py \
  runtime.input_dir=data/slices \
  runtime.output_dir=outputs/batch \
  runtime.concept="bladder tumor" \
  runtime.modality=MRI \
  runtime.pixel_spacing=[1.0,1.0] \
  runtime.save_intermediates=false

Set runtime.summary=true in conf/batch_infer.yaml to emit a JSON summary per case. Evaluation/baseline drivers described in implement.md follow the same Hydra pattern once they are added to the repo.

All Hydra scripts aggressively free CUDA cache between slices and record summaries (summary.json, metrics.json) for reproducibility.

📚 Citation

If you build on this repository, please cite us and the original BiomedParse publication:

@misc{shen2025r2segtrainingfreeoodmedical,
      title={R$^{2}$Seg: Training-Free OOD Medical Tumor Segmentation via Anatomical Reasoning and Statistical Rejection}, 
      author={Shuaike Shen and Ke Liu and Jiaqing Xie and Shangde Gao and Chunhua Shen and Ge Liu and Mireia Crispin-Ortuzar and Shangqi Gao},
      year={2025},
      eprint={2511.12691},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.12691}, 
}

@article{zhao2025foundation,
  title={A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities},
  author={Zhao, Theodore and Gu, Yu and Yang, Jianwei and Usuyama, Naoto and Lee, Ho Hin and Kiblawi, Sid and Naumann, Tristan and Gao, Jianfeng and Crabtree, Angela and Abel, Jacob and others},
  journal={Nature Methods},
  volume={22},
  number={1},
  pages={166--176},
  year={2025},
  publisher={Nature Publishing Group US New York}
}

R2Seg inherits the License from BiomedParse. The model is intended for research and development only;

About

Training-Free OOD Medical Tumor Segmentation via Anatomical Reasoning and Statistical Rejection

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages