R²-Seg: Training-Free OOD Medical Tumor Segmentation via Anatomical Reasoning and Statistical Rejection

Shuaike Shen^1*, Ke Liu^3*, Jiaqing Xie⁴, Shangde Gao³,
Chunhua Shen³, Ge Liu⁵, Mireia Crispin-Ortuzar², Shangqi Gao^2†
¹ Carnegie Mellon University · ² University of Cambridge · ³ Zhejiang University
⁴ ETH Zurich · ⁵ University of Illinois Urbana-Champaign
^* Equal contribution · ^† Corresponding authors
shuaikes@andrew.cmu.edu · kliu@zju.edu.cn · sg2162@cam.ac.uk

Foundation models for medical image segmentation struggle under out-of-distribution (OOD) shifts, often producing fragmented false positives on OOD tumors. We introduce R²-Seg, a training-free framework for robust OOD tumor segmentation that operates via a two-stage Reason-and-Reject process. First, the Reason step employs an LLM-guided anatomical reasoning planner to localize organ anchors and generate multi-scale ROIs. Second, the Reject step applies two-sample statistical testing to candidates generated by a frozen foundation model (BiomedParse) within these ROIs. This statistical rejection filter retains only candidates significantly different from normal tissue, effectively suppressing false positives. Our framework requires no parameter updates, making it compatible with zero-update test-time augmentation and avoiding catastrophic forgetting. On multi-center and multi-modal tumor segmentation benchmarks, R²-Seg substantially improves Dice, specificity, and sensitivity over strong baselines and the original foundation models.

✨ Highlights

OOD tumor segmentation: Robust segmentation of out-of-distribution tumors without any model fine-tuning.

LLM reasoning for anatomy: tta_pipeline.llm_inference queries an OpenAI-compatible endpoint to propose anchor organs, ROI instructions, and auditing rationale.
Statistical screening: Connected components are tested with GPU-backed MMD permutation tests plus Benjamini–Hochberg FDR control to down-rank noisy candidates.
False-positive Gating: Hierarchical gating to filter false-postive segmentation results.

🗂 Repository Layout

Module	Responsibility
`tta_pipeline/config.py`	Dataclasses for the planner, BiomedParse wrapper, ROI builder, stats stack, and mitigation thresholds.
`tta_pipeline/llm_inference.py`	`AnatomyPlanner` client that requests ROI plans from an OpenAI-compatible endpoint or fallback templates and parses JSON responses safely.
`tta_pipeline/biomedparse_inference.py`	Lazily loads BiomedParse, runs prompts over the full image and ROI crops, and fuses identity/flip predictions (`tta_aggregation` supports `max/mean/median`).
`tta_pipeline/roi_planning.py`	Converts anchor masks into padded, jittered square ROIs and tracks transforms for scattering predictions back to full resolution.
`tta_pipeline/postprocessing.py`	Extracts connected components, merges ROI/full-frame masks, and prepares candidate descriptors.
`tta_pipeline/statistical_test.py`	Implements GPU-friendly two-sample MMD permutation tests plus Benjamini–Hochberg FDR correction to score each candidate.
`tta_pipeline/prompt_normalisation.py`	Optional component for mapping free-form prompts to the BiomedParse vocabulary using either the LLM or heuristics.
`tta_pipeline/pipeline.py`	The high-level orchestration (`PromptedSegmentationPipeline.run`) that stitches the planner, segmenter, ROI planner, and statistics together.
`tta_pipeline/utils/image_io.py`	Minimal PNG reading/writing helpers used by the CLI entry points.

⚙️ Environment Setup

Clone

git clone https://github.com/Eurekashen/R2Seg.git
cd R2Seg

Create a Python environment

Option A (recommended):

conda env create -f environment.yml
conda activate biomedparse

Option B (manual, see INSTALLATION.md for more context):

conda create -n biomedparse python=3.9.19
conda activate biomedparse
# Install PyTorch that matches your CUDA stack
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
# Install Python dependencies
pip install -r assets/requirements/requirements.txt

LLM credentials
The planner and prompt normaliser expect an OpenAI-compatible key:
```
export OPENAI_API_KEY=sk-...
# Optional: point to an alternative endpoint
export OPENAI_BASE_URL=https://your-endpoint/v1
```
If no key is provided the pipeline falls back to deterministic templates defined in tta_pipeline.llm_inference.DEFAULT_PLANS.

🚀 Running the R²-Seg Pipeline

Single-image inference

python run_tta_pipeline.py \
  runtime.image=/path/to/slice.png \
  runtime.concept="bladder tumor" \
  runtime.modality=MRI \
  runtime.pixel_spacing=[1.0,1.0] \
  runtime.output_dir=outputs/tta_example \
  runtime.enable_tta=true \
  runtime.save_intermediates=false \
  pipeline.llm.model=qwen3-0.6b

Key runtime knobs (all override-able at launch time):

runtime.image: path to a PNG slice (see tta_pipeline/utils/image_io.py for accepted formats).
runtime.concept / runtime.modality: free-form text; concepts are normalised against inference_utils/target_dist.json.
runtime.pixel_spacing: [row_mm, col_mm] used to convert ROI padding in millimetres.
runtime.enable_fp_mitigation=true: toggles additional morphology/probability gates defined in pipeline.mitigation.
runtime.save_intermediates=true: persists ROIs, probability maps, overlays, LLM plans, and candidate statistics under pipeline_outputs/<concept>/.

Batch inference

batch_infer.py mirrors the runtime arguments from run_tta_pipeline.py but walks an entire directory (optionally recursively) and reports progress via tqdm:

python batch_infer.py \
  runtime.input_dir=data/slices \
  runtime.output_dir=outputs/batch \
  runtime.concept="bladder tumor" \
  runtime.modality=MRI \
  runtime.pixel_spacing=[1.0,1.0] \
  runtime.save_intermediates=false

Set runtime.summary=true in conf/batch_infer.yaml to emit a JSON summary per case. Evaluation/baseline drivers described in implement.md follow the same Hydra pattern once they are added to the repo.

All Hydra scripts aggressively free CUDA cache between slices and record summaries (summary.json, metrics.json) for reproducibility.

📚 Citation

If you build on this repository, please cite us and the original BiomedParse publication:

@misc{shen2025r2segtrainingfreeoodmedical,
      title={R$^{2}$Seg: Training-Free OOD Medical Tumor Segmentation via Anatomical Reasoning and Statistical Rejection}, 
      author={Shuaike Shen and Ke Liu and Jiaqing Xie and Shangde Gao and Chunhua Shen and Ge Liu and Mireia Crispin-Ortuzar and Shangqi Gao},
      year={2025},
      eprint={2511.12691},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.12691}, 
}

@article{zhao2025foundation,
  title={A foundation model for joint segmentation, detection and recognition of biomedical objects across nine modalities},
  author={Zhao, Theodore and Gu, Yu and Yang, Jianwei and Usuyama, Naoto and Lee, Ho Hin and Kiblawi, Sid and Naumann, Tristan and Gao, Jianfeng and Crabtree, Angela and Abel, Jacob and others},
  journal={Nature Methods},
  volume={22},
  number={1},
  pages={166--176},
  year={2025},
  publisher={Nature Publishing Group US New York}
}

R2Seg inherits the License from BiomedParse. The model is intended for research and development only;

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
biomedparse_datasets		biomedparse_datasets
conf		conf
configs		configs
datasets		datasets
docker		docker
inference_utils		inference_utils
modeling		modeling
pipeline		pipeline
trainer		trainer
tta_pipeline		tta_pipeline
utilities		utilities
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
SECURITY.md		SECURITY.md
__init__.py		__init__.py
batch_infer.py		batch_infer.py
entry.py		entry.py
environment.yml		environment.yml
example_prediction.py		example_prediction.py
pyproject.toml		pyproject.toml
run_tta_pipeline.py		run_tta_pipeline.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R²-Seg: Training-Free OOD Medical Tumor Segmentation via Anatomical Reasoning and Statistical Rejection

✨ Highlights

🗂 Repository Layout

⚙️ Environment Setup

🚀 Running the R²-Seg Pipeline

Single-image inference

Batch inference

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

R2-Seg: Training-Free OOD Medical Tumor Segmentation via Anatomical Reasoning and Statistical Rejection

✨ Highlights

🗂 Repository Layout

⚙️ Environment Setup

🚀 Running the R2-Seg Pipeline

Single-image inference

Batch inference

📚 Citation

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

R²-Seg: Training-Free OOD Medical Tumor Segmentation via Anatomical Reasoning and Statistical Rejection

🚀 Running the R²-Seg Pipeline

Packages