Surgical intelligence requires more than general visual understanding. It involves perception of surgical scenes, temporal understanding of procedural progress, and reasoning over instruments, anatomy, actions, and safety.
SurgVLM is a unified vision-language model designed for surgical intelligence. It supports diverse surgical tasks within a single modeling pipeline, spanning visual perception, temporal analysis, and high-level reasoning.
Our project includes:
- SurgVLM-DB: a surgical multimodal corpus for training surgical vision-language models
- SurgVLM Models: two specialized variants for instruction following and reasoning
- SurgVLM-Bench: A Comprehensive Surgical Benchmarks for VLMs
- Built specifically for surgical intelligence
- Supports visual perception, temporal understanding, and reasoning
- Covers 10 surgical tasks
- Includes two specialized models built on Qwen3.5-9B
- Designed for general surgical understanding and complex reasoning scenarios
- Part of the training data is publicly available on Hugging Face
SurgVLM-DB is a multimodal surgical corpus designed for training domain-specific vision-language models.
- Integrates diverse surgical data sources
- Covers multiple surgical procedures, anatomical structures, and task types
- Supports model training from low-level perception to high-level reasoning
- Provides a unified foundation for surgical vision-language learning
Part of the training data has been publicly released through SurgSigma-DB on Hugging Face:
- Hugging Face: SurgSigma/SurgSigma-DB
Additional resources and updates will be released through the project page and repository.
SurgVLM provides two specialized model variants, both built on Qwen3.5-9B, to support different surgical intelligence scenarios.
| Model | Backbone | Description |
|---|---|---|
| SurgVLM-9B-Instruct | Qwen3.5-9B | An instruction-tuned model for general surgical vision-language understanding |
| SurgVLM-9B-Reasoning | Qwen3.5-9B | A reasoning-oriented model for more complex surgical analysis |
| Resource | Download | Status | Description |
|---|---|---|---|
| SurgVLM-DB | Hugging Face | Partially released | Surgical multimodal corpus for model training |