Skip to content

HY-Wong/Autoencoder-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Autoencoder-Analysis

This repository contains 8 analysis scripts (analysis-{model}.py) to evaluate the reconstruction quality of various autoencoders with different latent shapes on the ImageNet-1K validation set. The evaluations measure pixel- and frequency-space reconstruction loss, perceptual similarity (LPIPS), and feature similarity using pretrained CLIP and SAM. The scripts are designed to run in distributed settings and generate reconstruction visualizations.

Sample Visualization

Below are sample batches showing ground-truth and reconstructed images:

Ground Truth Reconstruction (VA-VAE, f16c32)

Note: The reconstructed outputs tend to lack sharp textures and fine details, particularly in regions dominated by high-frequency information such as text or small faces.

Model Overview

Each analysis script (analysis-{model}.py) evaluates reconstruction quality for a specific autoencoder model.
To run successfully, each script must be placed inside the corresponding original model repository (e.g., analysis-ldm.py inside the latent-diffusion repo).


1D-Tokenizer

📄 NeurIPS 2024
Image Tokenization with Only 32 Tokens for Both Reconstruction and Generation
GitHub Repository

Environment Script
conda activate var analysis-1d-tokenizer.py

DiT

📄 ICCV 2023
Scalable Diffusion Models with Transformers
GitHub Repository

Environment Script
conda activate var analysis-DiT.py

EfficientViT

📄 ICCV 2023
Efficient Vision Foundation Models for High-Resolution Generation and Perception
GitHub Repository

Environment Script
conda activate var analysis-efficientvit.py

Latent Diffusion (LDM)

📄 CVPR 2022
High-Resolution Image Synthesis with Latent Diffusion Models
GitHub Repository

Environment Script
conda activate ldm analysis-ldm.py

Checkpoints

Pretrained weights are available from the official repositories:

VQ-VAEs

KL-VAEs


MAR

📄 NeurIPS 2024
Autoregressive Image Generation without Vector Quantization
GitHub Repository

Environment Script
conda activate ldm analysis-mar.py

RQ-VAE Transformer

📄 CVPR 2022
Autoregressive Image Generation using Residual Quantization
GitHub Repository

Environment Script
conda activate var analysis-rq-vae.py

VA-VAE

📄 CVPR 2025
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
GitHub Repository

Environment Script
conda activate ldm analysis-va-vae.py

VAR

📄 NeurIPS 2024
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
GitHub Repository

Environment Script
conda activate var analysis-var.py

Distributed Settings

Run the evaluation in a distributed setting using torchrun.

torchrun \
  --nnodes=2 \                # number of nodes
  --nproc_per_node=2 \        # number of GPUs per node
  --node_rank=0 \             # rank of the current node (0 for master)
  --master_addr=10.0.0.1 \    
  --master_port=29500 \    
  analysis-{model}.py \       # replace with the specific model script
  --data_path </path/to/imagenet-1k> \
  --batch_size 64 \
  --resos 256

Dataset

ImageNet Structure
/path/to/imagenet-1k/
    train/
        n01440764/
            *.JPEG
        n01443537/
            *.JPEG
    val/
        n01440764/
            ILSVRC2012_val_00000293.JPEG ...
        n01443537/
            ILSVRC2012_val_00000236.JPEG ...

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors