Skip to content

jinlab-imvr/SurgVLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Unified Vision-Language Foundation Model for Surgical Intelligence

Paper Project Page Dataset GitHub


Overview

Surgical intelligence requires more than general visual understanding. It involves perception of surgical scenes, temporal understanding of procedural progress, and reasoning over instruments, anatomy, actions, and safety.

SurgVLM is a unified vision-language model designed for surgical intelligence. It supports diverse surgical tasks within a single modeling pipeline, spanning visual perception, temporal analysis, and high-level reasoning.

Our project includes:

  • SurgVLM-DB: a surgical multimodal corpus for training surgical vision-language models
  • SurgVLM Models: two specialized variants for instruction following and reasoning
  • SurgVLM-Bench: A Comprehensive Surgical Benchmarks for VLMs

Highlights

  • Built specifically for surgical intelligence
  • Supports visual perception, temporal understanding, and reasoning
  • Covers 10 surgical tasks
  • Includes two specialized models built on Qwen3.5-9B
  • Designed for general surgical understanding and complex reasoning scenarios
  • Part of the training data is publicly available on Hugging Face

SurgVLM-DB

SurgVLM-DB is a multimodal surgical corpus designed for training domain-specific vision-language models.

Key Characteristics

  • Integrates diverse surgical data sources
  • Covers multiple surgical procedures, anatomical structures, and task types
  • Supports model training from low-level perception to high-level reasoning
  • Provides a unified foundation for surgical vision-language learning

Public Release

Part of the training data has been publicly released through SurgSigma-DB on Hugging Face:

Additional resources and updates will be released through the project page and repository.


Models

SurgVLM provides two specialized model variants, both built on Qwen3.5-9B, to support different surgical intelligence scenarios.

Model Backbone Description
SurgVLM-9B-Instruct Qwen3.5-9B An instruction-tuned model for general surgical vision-language understanding
SurgVLM-9B-Reasoning Qwen3.5-9B A reasoning-oriented model for more complex surgical analysis

Data

Resource Download Status Description
SurgVLM-DB Hugging Face Partially released Surgical multimodal corpus for model training

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors