GitHub - jinlab-imvr/SurgVLM

A Unified Vision-Language Foundation Model for Surgical Intelligence

Overview

Surgical intelligence requires more than general visual understanding. It involves perception of surgical scenes, temporal understanding of procedural progress, and reasoning over instruments, anatomy, actions, and safety.

SurgVLM is a unified vision-language model designed for surgical intelligence. It supports diverse surgical tasks within a single modeling pipeline, spanning visual perception, temporal analysis, and high-level reasoning.

Our project includes:

SurgVLM-DB: a surgical multimodal corpus for training surgical vision-language models
SurgVLM Models: two specialized variants for instruction following and reasoning
SurgVLM-Bench: A Comprehensive Surgical Benchmarks for VLMs

Highlights

Built specifically for surgical intelligence
Supports visual perception, temporal understanding, and reasoning
Covers 10 surgical tasks
Includes two specialized models built on Qwen3.5-9B
Designed for general surgical understanding and complex reasoning scenarios
Part of the training data is publicly available on Hugging Face

SurgVLM-DB

SurgVLM-DB is a multimodal surgical corpus designed for training domain-specific vision-language models.

Key Characteristics

Integrates diverse surgical data sources
Covers multiple surgical procedures, anatomical structures, and task types
Supports model training from low-level perception to high-level reasoning
Provides a unified foundation for surgical vision-language learning

Public Release

Part of the training data has been publicly released through SurgSigma-DB on Hugging Face:

Hugging Face: SurgSigma/SurgSigma-DB

Additional resources and updates will be released through the project page and repository.

Models

SurgVLM provides two specialized model variants, both built on Qwen3.5-9B, to support different surgical intelligence scenarios.

Model	Backbone	Description
SurgVLM-9B-Instruct	Qwen3.5-9B	An instruction-tuned model for general surgical vision-language understanding
SurgVLM-9B-Reasoning	Qwen3.5-9B	A reasoning-oriented model for more complex surgical analysis

Data

Resource	Download	Status	Description
SurgVLM-DB	Hugging Face	Partially released	Surgical multimodal corpus for model training

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github/workflows		.github/workflows
assets		assets
static		static
.DS_Store		.DS_Store
.nojekyll		.nojekyll
404.html		404.html
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
feed.xml		feed.xml
index.html		index.html
m4.html		m4.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Unified Vision-Language Foundation Model for Surgical Intelligence

Overview

Highlights

SurgVLM-DB

Key Characteristics

Public Release

Models

Data

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A Unified Vision-Language Foundation Model for Surgical Intelligence

Overview

Highlights

SurgVLM-DB

Key Characteristics

Public Release

Models

Data

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages