Skip to content

ArmastusChen/fit_check_generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generating Fit Check Videos with a Handheld Camera

University of Washington

Given two static mirror selfies (front and back) and motion data captured with a handheld mobile device, we synthesize a full-body video with a new scene background and consistent lighting. Our method introduces (1) a parameter-free frame generation strategy for video diffusion models, (2) a multi-reference attention mechanism to integrate appearance from both front and back photos, and (3) an image-based fine-tuning strategy to enhance sharpness and improve shadow/reflection rendering.

Installation

The code can be run under Python 3.11, PyTorch 2.3.1, and CUDA 11.8.

We use uv for dependency management. Install uv if you don't have it:

pip install uv

Then install all dependencies with one command:

uv sync

Activate the environment:

source .venv/bin/activate

Download Pretrained Models

Download the SVD base model:

git lfs install
git clone https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1

Download our model checkpoint and DWPose checkpoints into ckpt/:

huggingface-cli download boweiche/fit-check-videogen checkpoint-220000.pth --local-dir ckpt
huggingface-cli download boweiche/fit-check-videogen DWPose/yolox_l.onnx DWPose/dw-ll_ucoco_384.onnx --local-dir ckpt

Inference

Prepare Inputs

The repo ships with a sample test case:

data/
├── ref_videos/
│   └── demo/
│       ├── front.jpg          # front-facing mirror selfie
│       └── back.jpg           # back-facing mirror selfie
├── motions/
│   └── demo/
│       ├── dwpose/            # per-frame pose files (.npy + .png)
│       └── rgb/               # per-frame RGB images (.jpg), used with --motion_rgb
└── backgrounds/
    └── demo.jpg

To use your own data, place front/back selfies under data/ref_videos/<name>/, motion data under data/motions/<name>/, and a background image under data/backgrounds/. Then update inference.py with the corresponding names.

Run

Using pre-extracted DWPose files:

CUDA_VISIBLE_DEVICES=0 OMP_NUM_THREADS=8 python inference.py --inference_config configs/test.yaml \
    --motion_name demo --ref_video_name demo --background_name demo

Running DWPose detection on motion RGB frames at inference time:

CUDA_VISIBLE_DEVICES=0 OMP_NUM_THREADS=8 python inference.py --inference_config configs/test.yaml \
    --motion_name demo --ref_video_name demo --background_name demo --motion_rgb

Results are saved to results/ and include out.mp4, reference image visualizations, and pose visualizations.

Note: The released code does not include the face refinement step described in the paper. The output videos may have lower face quality compared to the paper results.

Acknowledgement

This codebase is adapted from MimicMotion and stable-video-diffusion.

Citation

If you find our work useful for your research, please consider citing the paper:

@article{chen2025fitcheck,
  title={Generating Fit Check Videos with a Handheld Camera},
  author={Chen, Bowei and Curless, Brian and Kemelmacher-Shlizerman, Ira and Seitz, Steven M.},
  journal={arXiv preprint arXiv:2505.23886},
  year={2025}
}

About

Generating Fit Check Videos with a Handheld Camera

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages