SAM3-Gemma4-CUDA

SAM3-Gemma4-CUDA is an experimental computer vision and multimodal reasoning application that combines Facebook's Segment Anything Model 3 (SAM3) with Google's Gemma 4 vision-language model. This suite offers a comprehensive interactive web interface for high-precision image object detection, video segmentation, and interactive click-based tracking. By leveraging SAM3 to generate robust candidate segmentation masks and using Gemma 4 to intelligently filter and reason about these proposals based on user prompts, the tool ensures highly accurate and context-aware visual grounding. Built on Gradio with a heavily customized "Steel Blue" theme, the application is fully optimized for CUDA-enabled GPUs, providing researchers and developers with a powerful, real-time environment for exploring advanced vision-language workflows.

Key Features

Image Detection with AI Filtering: Utilizes SAM3 to generate candidate regions from a text prompt. Gemma 4 then analyzes these proposals and filters them to match the user's request, providing both a visual output and a natural language explanation of its selections.
Automated Video Segmentation: Processes video files using SAM3's video tracking model. Users can input a text prompt to automatically segment objects across frames, rendering either pure colored mask overlays or comprehensive bounding boxes and contours on the video output.
Interactive Click Segmentation: Features an interactive canvas where users can click directly on target objects. The SAM3 tracker instantly generates and overlays precise segmentation masks based on the cumulative point data.
Custom User Interface: Built on Gradio with a deeply customized, responsive "Steel Blue" theme using advanced CSS. It includes intuitive upload zones, real-time status indicators, and integrated JSON/text explanation outputs.
Hardware Optimization: Utilizes dynamic memory management, inference mode, and optimal data types (bfloat16/float16) to efficiently run complex vision and language models on CUDA-enabled GPUs.

Repository Structure

├── examples/
│   ├── 1.jpg
│   ├── 1V.mp4
│   ├── 2.jpg
│   └── 3.jpg
├── app.py
├── LICENSE.txt
├── pre-requirements.txt
├── README.md
└── requirements.txt

Installation and Requirements

To run SAM3-Gemma4-CUDA locally, you must configure a Python environment with the following dependencies. A compatible CUDA-enabled GPU is highly recommended to handle the intensive model loads.

1. Install Pre-requirements Run the following command to update pip to the required version before installing the main dependencies:

pip install pip>=26.0.0

2. Install Core Requirements Install the necessary machine learning, computer vision, and UI libraries. You can place these in a requirements.txt file and run pip install -r requirements.txt.

torch
torchvision
transformers==5.5.0
supervision
scipy
timm
accelerate
gradio
einops
ninja
decord
scikit-learn
scikit-image
matplotlib
modelscope
pycocotools
opencv-python
sentencepiece
qwen-vl-utils
pillow
kernels
requests
av
hf_xet

Usage

Once your environment is set up and the dependencies are successfully installed, you can launch the application by running the main Python script:

python app.py

Upon initialization, the script will load the SAM3 image, tracker, and video models alongside the Gemma 4 model into your VRAM. Once complete, it will provide a local web address (usually http://127.0.0.1:7860/) which you can open in your browser. You can navigate between the different tabs to experiment with text-to-image filtering, video mask propagation, or interactive click-based tracking.

License and Source

License: SAM License (Available at LICENSE.txt)
GitHub Repository: https://github.com/PRITHIVSAKTHIUR/SAM3-Gemma4-CUDA.git

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAM3-Gemma4-CUDA

Key Features

Repository Structure

Installation and Requirements

Usage

License and Source

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
examples		examples
LICENSE.txt		LICENSE.txt
README.md		README.md
app.py		app.py
pre-requirements.txt		pre-requirements.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

SAM3-Gemma4-CUDA

Key Features

Repository Structure

Installation and Requirements

Usage

License and Source

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages