Skip to content

MT522/enhance_inpainting

Repository files navigation

A Transformer-Based Generalization Pipeline for Inpainting Models

This project fine-tunes a BLIP model to generate better inpainting captions.

The preprint of the paper can be reviewed in this link.

📦 Requirements

Install dependencies (preferably in a virtual environment):

pip install torch torchvision transformers pandas tqdm pillow
Optional (for GPU support):
pip install torch --index-url https://download.pytorch.org/whl/cu118

🧠 Model Overview

Base: BLIP, StableDiffusion-Inpaint

Head: MLP regressor that outputs 3 values: [SSIM, PSNR, CLIP Score]

Loss: Weighted MSE / custom weighted difference loss

🚀 Running the Script

  1. Place your data Place images in a directory like test2014/

Download a pretrained BLIP model (e.g., from Hugging Face) into a blip/ folder

  1. Run main
python main.py

The script will generate a csv file of all the losses. This will be used to train the MLP head and finetune BLIP.

  1. Run training
python finetune_blip.py

The script will:

Train the MLP head for epochs_mlp epochs

Fine-tune select layers of BLIP for epochs_blip epochs

Save the final model to:

blip-v2/fine_tuned_blip_with_metrics.pth

  1. Run main with updated BLIP model

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages