SIGMA-GEN: Structure and Identity Guided Multi-subject Assembly for Image Generation

Oindrila Saha¹, Vojtech Krs², Radomir Mech²,
Subhransu Maji¹, Kevin Blackburn-Matzen^2*, Matheus Gadelha^2*

¹ University of Massachusetts Amherst ² Adobe Research

* denotes equal advising

sigmagen_teaser_linkedin.mp4

Overview

SIGMA-Gen enables multi-subject image generation and editing in a single forward pass. Given reference images of subjects, spatial masks indicating where each subject should appear, precise/box depth and a text prompt, SIGMA-Gen composes them into a coherent scene while preserving each subject's identity.

This code supports two modes:

Generation — compose subjects into a new scene from scratch
Editing / Insertion — insert subjects into an existing image with latent blending for seamless background preservation

Results

Multi-Control Generation with 2D box, 3D box, and precise depth combined in a single generation	Subject Insertion with precise mask and depth

"a bowl, a can and a toy on a table"	"a woman parasailing over a yacht"

Getting Started

Requirements

Python 3.10+
CUDA-capable GPU (32GB+ VRAM recommended)
Access to FLUX.1-Kontext-dev (gated model — accept the license on HuggingFace)

Installation

pip install torch torchvision
pip install diffusers>=0.35.0 transformers peft accelerate
pip install scipy numpy Pillow

Model Weights

LoRA weights are hosted on HuggingFace and downloaded automatically on first run:

Repository	Description
`oindrila13saha/sigma-gen-lora`	Dual LoRA adapters (identity + spatial conditioning)
`black-forest-labs/FLUX.1-Kontext-dev`	Base model (downloaded automatically)

Usage

`run.py` — CLI Inference

run.py reads a folder of inputs and generates an image.

python run.py \
  --folder examples/multi_control \
  --prompt "a bowl, a can and a toy on a table" \
  --out outputs/multi_control.png

Input Folder Format

Each example is a folder with the following files:

example_folder/
  generated_nobg_0.png        # Subject 0 (RGBA, background removed)
  generated_nobg_1.png        # Subject 1
  ...
  mask_object_0.png           # Binary mask for subject 0 (white = subject region)
  mask_object_1.png           # Binary mask for subject 1
  ...
  source.png                  # (Optional) Background image for edit/insertion mode
  depth_precise.png           # (Optional) Depth map for precise regions
  depth_box.png               # (Optional) Depth map for box regions

File Descriptions

File	Required	Description
`generated_nobg_*.png`	Yes	Subject reference images with background removed (RGBA). Numbered starting from 0.
`mask_object_*.png`	Yes	Per-subject binary masks (white on black). Each mask defines where the corresponding subject should be placed. Numbered to match subjects.
`source.png`	No	Source image for insertion/editing. When present, the pipeline uses latent blending to preserve unmasked regions. When absent, a new image is generated from scratch.
`depth_precise.png`	No	Depth information for precise object regions.
`depth_box.png`	No	Depth information for bounding box regions.

How It Works

Subject tiles — Each generated_nobg_*.png is cropped to its content, resized to fit a 512x512 tile, and encoded as a separate identity condition with a unique subject_id.
Spatial condition — The masks and optional depth maps are combined into an RGB condition image:
- R channel: Indexed mask (each subject gets a unique intensity value)
- G channel: depth_precise (depth for precise object silhouettes)
- B channel: depth_box (depth for bounding box regions)
Dual LoRA — Two LoRA adapters work together:
- cond1: Identity conditioning (encodes subject appearance)
- cond2: Spatial conditioning (encodes layout and structure)
Edit mode (when source.png is present) — The source image is noise-encoded at each timestep, and a blend mask ensures the background is preserved while subjects are seamlessly inserted.

Interactive Demo

A Gradio-based interactive demo is also included and hosted on HuggingFace Spaces:

Try the live demo

To run the demo locally:

# Install additional demo dependencies
pip install gradio>=5.0.0 ./custom_components/dist/gradio_maskeditor-0.0.1-py3-none-any.whl

# Launch
python gradio_demo.py

The demo opens at http://localhost:7860 and supports:

Drawing subject regions on a canvas with color-coded rectangles (Red = Subject 1, Green = Subject 2, etc.)
Automatic background removal on uploaded subject images
Auto-detected mode — upload a background image to edit, or leave blank to generate from scratch
Pre-loaded examples for quick experimentation

Acknowledgements

The pipeline code is adapted from OminiControl.

Citation

If you find our work useful, please cite:

@article{saha2025sigma,
  title={SIGMA-GEN: Structure and Identity Guided Multi-subject Assembly for Image Generation},
  author={Saha, Oindrila and Krs, Vojtech and Mech, Radomir and Maji, Subhransu and Blackburn-Matzen, Kevin and Gadelha, Matheus},
  journal={arXiv preprint arXiv:2510.06469},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
custom_components		custom_components
examples		examples
gradio_examples		gradio_examples
omini		omini
outputs		outputs
.gitignore		.gitignore
Adobe_icon_RGB_red.png		Adobe_icon_RGB_red.png
README.md		README.md
gradio_demo.py		gradio_demo.py
packages.txt		packages.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py
umass.png		umass.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIGMA-GEN: Structure and Identity Guided Multi-subject Assembly for Image Generation

Overview

Results

Getting Started

Requirements

Installation

Model Weights

Usage

`run.py` — CLI Inference

Input Folder Format

File Descriptions

How It Works

Interactive Demo

Acknowledgements

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SIGMA-GEN: Structure and Identity Guided Multi-subject Assembly for Image Generation

Overview

Results

Getting Started

Requirements

Installation

Model Weights

Usage

run.py — CLI Inference

Input Folder Format

File Descriptions

How It Works

Interactive Demo

Acknowledgements

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`run.py` — CLI Inference

Packages