Oindrila Saha1,
Vojtech Krs2,
Radomir Mech2,
Subhransu Maji1,
Kevin Blackburn-Matzen2*,
Matheus Gadelha2*
1 University of Massachusetts Amherst 2 Adobe Research
* denotes equal advising
sigmagen_teaser_linkedin.mp4
SIGMA-Gen enables multi-subject image generation and editing in a single forward pass. Given reference images of subjects, spatial masks indicating where each subject should appear, precise/box depth and a text prompt, SIGMA-Gen composes them into a coherent scene while preserving each subject's identity.
This code supports two modes:
- Generation — compose subjects into a new scene from scratch
- Editing / Insertion — insert subjects into an existing image with latent blending for seamless background preservation
- Python 3.10+
- CUDA-capable GPU (32GB+ VRAM recommended)
- Access to FLUX.1-Kontext-dev (gated model — accept the license on HuggingFace)
pip install torch torchvision
pip install diffusers>=0.35.0 transformers peft accelerate
pip install scipy numpy PillowLoRA weights are hosted on HuggingFace and downloaded automatically on first run:
| Repository | Description |
|---|---|
oindrila13saha/sigma-gen-lora |
Dual LoRA adapters (identity + spatial conditioning) |
black-forest-labs/FLUX.1-Kontext-dev |
Base model (downloaded automatically) |
run.py reads a folder of inputs and generates an image.
python run.py \
--folder examples/multi_control \
--prompt "a bowl, a can and a toy on a table" \
--out outputs/multi_control.pngEach example is a folder with the following files:
example_folder/
generated_nobg_0.png # Subject 0 (RGBA, background removed)
generated_nobg_1.png # Subject 1
...
mask_object_0.png # Binary mask for subject 0 (white = subject region)
mask_object_1.png # Binary mask for subject 1
...
source.png # (Optional) Background image for edit/insertion mode
depth_precise.png # (Optional) Depth map for precise regions
depth_box.png # (Optional) Depth map for box regions
| File | Required | Description |
|---|---|---|
generated_nobg_*.png |
Yes | Subject reference images with background removed (RGBA). Numbered starting from 0. |
mask_object_*.png |
Yes | Per-subject binary masks (white on black). Each mask defines where the corresponding subject should be placed. Numbered to match subjects. |
source.png |
No | Source image for insertion/editing. When present, the pipeline uses latent blending to preserve unmasked regions. When absent, a new image is generated from scratch. |
depth_precise.png |
No | Depth information for precise object regions. |
depth_box.png |
No | Depth information for bounding box regions. |
-
Subject tiles — Each
generated_nobg_*.pngis cropped to its content, resized to fit a 512x512 tile, and encoded as a separate identity condition with a uniquesubject_id. -
Spatial condition — The masks and optional depth maps are combined into an RGB condition image:
- R channel: Indexed mask (each subject gets a unique intensity value)
- G channel:
depth_precise(depth for precise object silhouettes) - B channel:
depth_box(depth for bounding box regions)
-
Dual LoRA — Two LoRA adapters work together:
cond1: Identity conditioning (encodes subject appearance)cond2: Spatial conditioning (encodes layout and structure)
-
Edit mode (when
source.pngis present) — The source image is noise-encoded at each timestep, and a blend mask ensures the background is preserved while subjects are seamlessly inserted.
A Gradio-based interactive demo is also included and hosted on HuggingFace Spaces:
To run the demo locally:
# Install additional demo dependencies
pip install gradio>=5.0.0 ./custom_components/dist/gradio_maskeditor-0.0.1-py3-none-any.whl
# Launch
python gradio_demo.pyThe demo opens at http://localhost:7860 and supports:
- Drawing subject regions on a canvas with color-coded rectangles (Red = Subject 1, Green = Subject 2, etc.)
- Automatic background removal on uploaded subject images
- Auto-detected mode — upload a background image to edit, or leave blank to generate from scratch
- Pre-loaded examples for quick experimentation
The pipeline code is adapted from OminiControl.
If you find our work useful, please cite:
@article{saha2025sigma,
title={SIGMA-GEN: Structure and Identity Guided Multi-subject Assembly for Image Generation},
author={Saha, Oindrila and Krs, Vojtech and Mech, Radomir and Maji, Subhransu and Blackburn-Matzen, Kevin and Gadelha, Matheus},
journal={arXiv preprint arXiv:2510.06469},
year={2025}
}
