This repository provides a script for running Stable Diffusion XL with multiple ControlNets (Canny, Depth) using the Hugging Face diffusers library. Uses a refiner at the end to tidy things up. Not currently using denoising_start/denoising_end with latents. Could be an improvement, but it is not clear if that is possible with a controlnet pipeline.
python sdxl_diffusers_control.pyRuns in interactive mode where you can:
- Generate images using default configuration
- Create/modify
config.jsonfile with your settings - Press ENTER to regenerate with new settings
- Press Ctrl+C to exit
The script will monitor config.json and reload it for each generation, allowing you to experiment with different settings without restarting the script.
python sdxl_diffusers_control.py --config batch_config.jsonProcesses multiple configurations from a JSON file. The JSON file must contain an array of configuration objects.
The configuration file must be a JSON array containing one or more configuration objects:
[
{
"prompt": "anime style artwork",
"input_image": "./inputs/image1.png",
"output_dir": "outputs/batch1",
"final_output": "outputs/batch1/anime.png",
"seed": 42
},
{
"prompt": "studio ghibli style",
"input_image": "./inputs/image2.png",
"output_dir": "outputs/batch2",
"final_output": "outputs/batch2/ghibli.png",
"seed": 123
}
]The following environment variables can be set to control the pipeline behavior:
LOCAL_FILES_ONLY: Set to "true" (default) to only use locally cached models, "false" to allow downloadingDEPTH_MODEL_TYPE: Choose depth estimation model - "dpt" or "depth_anything_v2" (default)DEPTH_ANYTHING_MODEL: Specify Depth Anything model when using depth_anything_v2 (default: "depth-anything/Depth-Anything-V2-Base-hf")
Example:
export DEPTH_MODEL_TYPE="depth_anything_v2"
export DEPTH_ANYTHING_MODEL="depth-anything/Depth-Anything-V2-Large-hf"
export LOCAL_FILES_ONLY="false"
python sdxl_diffusers_control.pyAll parameters are optional. If not specified, the default value from the Config class will be used.
model_repo: HuggingFace model repositorydepth_controlnet_path: Path to depth controlnet model (diffusers format)canny_controlnet_path: Path to canny controlnet model (diffusers format)depth_model: Depth estimation model (default: "Intel/dpt-hybrid-midas")
input_image: Path to input imageoutput_dir: Output directorydepth_output: Path for depth control imagecanny_output: Path for canny control imagefinal_output: Path for final generated image
prompt: Text prompt for generationnegative_prompt: Negative promptaspect_ratio: Aspect ratio mode - "auto" (default), "square", "landscape", or "portrait"- "auto": Automatically detect from input image
- "square": 1024x1024
- "landscape": 1344x768 (16:9)
- "portrait": 768x1344 (9:16)
height: Image height (set automatically based on aspect_ratio)width: Image width (set automatically based on aspect_ratio)num_inference_steps: Number of denoising steps (default: 60)guidance_scale: Classifier-free guidance scale (default: 3.5)seed: Random seed for reproducibility
depth_controlnet_conditioning_scale: Depth control strength (0.0-1.0)canny_controlnet_conditioning_scale: Canny control strength (0.0-1.0)depth_control_guidance_start: When to start depth control (0.0-1.0)canny_control_guidance_start: When to start canny control (0.0-1.0)depth_control_guidance_end: When to end depth control (0.0-1.0)canny_control_guidance_end: When to end canny control (0.0-1.0)
canny_low_threshold: Lower threshold for Canny edge detection (default: 50)canny_high_threshold: Upper threshold for Canny edge detection (default: 200)
device: Torch device (default: "cuda")cache_dir: Cache directory for modelslocal_files_only: Only use local files (default: true)offline_mode: Run in offline mode (default: true)log_level: Logging level ("DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL")
[
{
"prompt": "anime style, vibrant colors",
"input_image": "./inputs/portrait.png",
"output_dir": "outputs/styles",
"final_output": "outputs/styles/anime.png",
"canny_controlnet_conditioning_scale": 0.9,
"num_inference_steps": 50,
"seed": 42
},
{
"prompt": "oil painting style, classical",
"input_image": "./inputs/portrait.png",
"output_dir": "outputs/styles",
"final_output": "outputs/styles/oil_painting.png",
"canny_controlnet_conditioning_scale": 0.7,
"depth_controlnet_conditioning_scale": 0.3,
"num_inference_steps": 60,
"seed": 42
},
{
"prompt": "watercolor style, soft edges",
"input_image": "./inputs/portrait.png",
"output_dir": "outputs/styles",
"final_output": "outputs/styles/watercolor.png",
"canny_controlnet_conditioning_scale": 0.5,
"num_inference_steps": 40,
"seed": 42
}
]- The pipeline and models are loaded once at startup
- For each configuration in the array:
- Load the input image
- Generate depth, canny control maps
- Generate the final image using the control maps
- Save all outputs to specified paths
- Report success/failure statistics at the end
Depends on your model. Should just fit on an L4 or A10G. (24GB mem needed)
depth-anything/Depth-Anything-V2-Base-hf
| 0 NVIDIA RTX A5000 On | 00000000:D1:00.0 Off | Off |
| 30% 30C P8 23W / 230W | 22757MiB / 24564MiB | 0% Default |
depth-anything/Depth-Anything-V2-Large-hf gives a better quality depth but is too big to fit on 48gb with everything else.
- Keep all configurations in a batch using the same model paths for efficiency
- Use consistent output directory structure for easy organization
- Set specific seeds for reproducible results
- Adjust control scales to balance between prompt adherence and structural control