A diffusion-based framework for protein backmapping from coarse-grained to all-atom structures.
Overview • Installation • Datasets • Usage • Citation
- [2026.01] Our paper has been accepted by Journal of Chemical Theory and Computation (JCTC)! 🎉
- [2025.10] Initial release (v0.1).
CODLAD (Constraint-Decoupled Latent Diffusion) is a novel two-stage framework designed to reconstruct all-atom protein structures from coarse-grained representations. It solves the efficiency and stability bottlenecks in protein backmapping by introducing constraint decoupling in the latent space.
- ⚡ Two-Stage Architecture:
- Compression: Encodes atomic structures while preserving structural constraints (VQ-VAE).
- Generation: Performs diffusion in a simplified latent space (Latent Diffusion).
- 🔬 Physically Realistic: Maintains structural validity inherent to the compression phase.
- 🚀 Efficiency: Significantly reduced computational costs compared to existing all-atom generation methods.
We recommend using Anaconda to manage the environment.
# 1. Clone the repository
git clone [https://github.com/xiaoxiaokuye/CODLAD.git](https://github.com/xiaoxiaokuye/CODLAD.git)
cd CODLAD
# 2. Create and activate conda environment
conda create -n codlad python=3.11
conda activate codlad
# 3. Install dependencies (CUDA 12.1 required)
pip install -r requirements.txtYou can download the PDB, PED datasets, and Pretrained Checkpoints from our Google Drive: 👉 Download Link
After downloading, please organize the files as follows:
CODLAD/
├── results/ # Place the downloaded checkpoints here
├── datasets/
│ ├── protein/
│ │ ├── PDB/ # Place PDB data files here
│ │ ├── PED/ # Place PED data files here
│ │ └── Atlas/ # See instructions below
│ └── ...
└── ...
For Atlas Dataset: Please use the provided script to download and setup:
cd scripts
bash download_atlas.sh
# The script will handle downloading and moving files to ../datasets/protein/Atlas/This stage compresses the all-atom structure into a coarse-grained latent representation.
1. Data Preprocessing
python extract_features.py \
--process_data \
--dataname PED # Options: PED, Atlas, PDB2. Training VQ-VAE
python train_vqvae.py -load_json ./scripts/Vae_vqvae_PED_ns36_vq3_vq4096.json3. VAE Inference (Reconstruction)
python test.py \
--backbone mpnn_diffusion \
--vae_type N6 \
--num_sampling_steps 100 \
--experiment recon \
--data_type PED \
--num_ensemble 10 # Note: N6 for PED, K3 for PDB, K4 for AtlasThis stage learns the distribution of the latent representations conditioned on coarse-grained structures.
1. Extract Latent Features
python extract_features.py \
--extract_features \
--data-path ./datasets/preproccess_PED \
--features-path ./datasets/features_N6 \
--vae_type N6 \
--dataname PED2. Training Latent Diffusion Model
accelerate launch --multi_gpu \
./train_latent.py \
--lr 3e-4 \
--warmup 80000 \
--schedule_step 1200000 \
--final_lr 1e-5 \
--batch_size 128 \
--model diffusion \
--class_dropout_prob 0 \
--latent_size 3 \
--backbone mpnn_diffusion \
--feature_path './datasets/features_N6' \
--exp './Diff_PED_mpnnnew'3. Diffusion Inference
python test.py \
--exp "Diff_PED_mpnnnew" \
--backbone mpnn_diffusion \
--model diffusion \
--cfg_scale 0.0 \
--latent_size 3 \
--vae_type N6 \
--num_sampling_steps 100 \
--experiment latent \
--data_type PED \
--num_ensemble 10 If you find this code useful in your research, please cite our paper:
@article{CODLAD2026,
title={Constraint Decoupled Latent Diffusion for Protein Backmapping},
author={Xu Han, Yuancheng Sun, Kai Chen, Yuxuan Ren, Kang Liu, Qiwei Ye},
journal={Journal of Chemical Theory and Computation},
year={2026},
publisher={ACS Publications},
doi={10.1021/acs.jctc.5c01364},
note={In Press}
}