- [2025-12-15] Released the training code with huggingface dataset support (also example dataset on huggingface).
- [2025-09-02] PartEdit local gradio demo released and huggingface demo live at huggingface.
- [2025-09-01] PartEdit embeddings and custom training data used released on huggingface .
- [2025-06-02] PartEdit updated version now on Arxiv.
- [2025-04-01] PartEdit was accepted to Siggraph 2025 conference track.
- [2025-03-09] PartEdit benchmark available on huggingface.
- [2025-02-06] PartEdit now available on Arxiv.
In this paper, We present the first text-based image editing approach for object parts based on pre-trained diffusion models. Diffusion-based image editing approaches capitalized on the deep understanding of diffusion models of image semantics to perform a variety of edits. However, existing diffusion models lack sufficient understanding of many object parts, hindering fine-grained edits requested by users. To address this, we propose to expand the knowledge of pre-trained diffusion models to allow them to understand various object parts, enabling them to perform fine-grained edits. We achieve this by learning special textual tokens that correspond to different object parts through an efficient token optimization process. These tokens are optimized to produce reliable localization masks at each inference step to localize the editing region. Leveraging these masks, we design feature-blending and adaptive thresholding strategies to execute the edits seamlessly. To evaluate our approach, we establish a benchmark and an evaluation protocol for part editing. Experiments show that our approach outperforms existing editing methods on all metrics and is preferred by users 66-90% of the time in conducted user studies.
# from the folder containing environment.yaml
conda env create -f environment.yaml
# (or, faster)
mamba env create -f environment.yamlfollowed by
conda activate parteditNote for newer pytorch, they switched for pip only
The Jupyter notebook getting_started.ipynb contains a full example of how to use PartEdit with SDXL.
To run the demo, simply execute, the downloading of model and embeddings will happen automatically:
hf login # if you have a token
# get a token from https://huggingface.co/settings/tokens
# older versions use `huggingface-cli login`followed by
python app.pyThen open your browser at http://localhost:7860 (or the link provided in the terminal).
The current code has been tested with diffusers library. But there might be minor differences for some samples between different versions.
The datasets generated in the experiments can be found at Pascal Part and PartImageNet. We train Human Torso, Human Head and Human Hair from Pascal Part and PartImageNet for the rest that is not custom. I have extracted example dataset hosted on huggingface to provide example, and remove the hard link with detectron2 used previously for training.
We want to thank the authors of Prompt-to-Prompt-with-sdxl and DAAM, StabilityAI (Stable diffusion XL), and OVAM which was used for the training base OVAMXL and SLiMe for layer selection optimization.
- Fix fp16 training
To train install the updated enviroment.yaml or update with pip install torchmetrics git+https://github.com/Gorluxor/ovamxl.git
python -m src.unified_training --config configs/quadruped_head.yaml Can check the dataset for other classes or create your own from synthetic or real data.
Note: Takes around 64GB with fp32 and 8 layers selected on 100 images with 2000 steps (approx ~1.5 hours on Nvidia A100 80GB). Training with full gradiant over the small dataset.
BibTeX:
@inproceedings{cvejic2025partedit,
title={PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models},
author={Cvejic, Aleksandar and Eldesokey, Abdelrahman and Wonka, Peter},
booktitle={Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
pages={1--11},
year={2025}
}APA:
Cvejic, A., Eldesokey, A., & Wonka, P. (2025, August). PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models. In Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers (pp. 1-11).
