This repository contains 8 analysis scripts (analysis-{model}.py) to evaluate the reconstruction quality of various autoencoders with different latent shapes on the ImageNet-1K validation set. The evaluations measure pixel- and frequency-space reconstruction loss, perceptual similarity (LPIPS), and feature similarity using pretrained CLIP and SAM. The scripts are designed to run in distributed settings and generate reconstruction visualizations.
Below are sample batches showing ground-truth and reconstructed images:
| Ground Truth | Reconstruction (VA-VAE, f16c32) |
|---|---|
![]() |
![]() |
Note: The reconstructed outputs tend to lack sharp textures and fine details, particularly in regions dominated by high-frequency information such as text or small faces.
Each analysis script (analysis-{model}.py) evaluates reconstruction quality for a specific autoencoder model.
To run successfully, each script must be placed inside the corresponding original model repository (e.g., analysis-ldm.py inside the latent-diffusion repo).
📄 NeurIPS 2024
Image Tokenization with Only 32 Tokens for Both Reconstruction and Generation
GitHub Repository
| Environment | Script |
|---|---|
conda activate var |
analysis-1d-tokenizer.py |
📄 ICCV 2023
Scalable Diffusion Models with Transformers
GitHub Repository
| Environment | Script |
|---|---|
conda activate var |
analysis-DiT.py |
📄 ICCV 2023
Efficient Vision Foundation Models for High-Resolution Generation and Perception
GitHub Repository
| Environment | Script |
|---|---|
conda activate var |
analysis-efficientvit.py |
📄 CVPR 2022
High-Resolution Image Synthesis with Latent Diffusion Models
GitHub Repository
| Environment | Script |
|---|---|
conda activate ldm |
analysis-ldm.py |
Pretrained weights are available from the official repositories:
VQ-VAEs
KL-VAEs
📄 NeurIPS 2024
Autoregressive Image Generation without Vector Quantization
GitHub Repository
| Environment | Script |
|---|---|
conda activate ldm |
analysis-mar.py |
📄 CVPR 2022
Autoregressive Image Generation using Residual Quantization
GitHub Repository
| Environment | Script |
|---|---|
conda activate var |
analysis-rq-vae.py |
📄 CVPR 2025
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
GitHub Repository
| Environment | Script |
|---|---|
conda activate ldm |
analysis-va-vae.py |
📄 NeurIPS 2024
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
GitHub Repository
| Environment | Script |
|---|---|
conda activate var |
analysis-var.py |
Run the evaluation in a distributed setting using torchrun.
torchrun \
--nnodes=2 \ # number of nodes
--nproc_per_node=2 \ # number of GPUs per node
--node_rank=0 \ # rank of the current node (0 for master)
--master_addr=10.0.0.1 \
--master_port=29500 \
analysis-{model}.py \ # replace with the specific model script
--data_path </path/to/imagenet-1k> \
--batch_size 64 \
--resos 256ImageNet Structure
/path/to/imagenet-1k/
train/
n01440764/
*.JPEG
n01443537/
*.JPEG
val/
n01440764/
ILSVRC2012_val_00000293.JPEG ...
n01443537/
ILSVRC2012_val_00000236.JPEG ...

