A comprehensive list of papers about Large-Language-Diffusion-Models.
Important
Contributions welcome:
-
If you have a relevant paper not included in the library, please contact us! Or, you may also consider submitting 'Pull requests' directly, thank you!
-
If you think your paper is more suitable for another category, please contact us or submit 'Pull requests'.
-
If your paper is accepted, you may consider updating the relevant information.
-
Thank you!
- 🔥🔥🔥 Awsome-Large-LDM is now open!
| Paper Title | Year | Conference/Journal | Remark |
|---|---|---|---|
| David helps Goliath: Inference-Time Collaboration Between Small Specialized and Large General Diffusion LMs | 2023 | NAACL | |
| Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning | 2023 | Arxiv | |
| TESS 2: A Large-Scale Generalist Diffusion Language Model | 2025 | ACL | Adapted from Mistral-7B-v0.1 |
| Scaling Diffusion Language Models via Adaptation from Autoregressive Models | 2025 | ICLR | 127M~7B (GPT2, LLaMA2) |
| Large Language Diffusion Models | 2025 | Arxiv | LLaDA-8B |
| LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models | 2025 | Arxiv | |
| Large Language Models to Diffusion Finetuning | 2025 | Arxiv |
| Paper Title | Year | Conference/Journal | Remark |
|---|---|---|---|
| Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion | 2025 | Arxiv | |
| dKV-Cache: The Cache for Diffusion Language Models | 2025 | Arxiv | |
| Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding | 2025 | Arxiv |
| Paper Title | Year | Conference/Journal | Remark |
|---|---|---|---|
| Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models | 2025 | Arxiv | |
| d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning | 2025 | Arxiv | |
| Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models | 2024 | NeurIPS |
| Paper Title | Year | Conference/Journal | Remark |
|---|---|---|---|
| MMaDA: Multimodal Large Diffusion Language Models | 2025 | Arxiv | |
| LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning | 2025 | Arxiv |
| Paper Title | Year | Conference/Journal | Remark |
|---|---|---|---|
| Diffusion-LM Improves Controllable Text Generation | 2022 | NeurIPS | Embedding |
| DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models | 2023 | ICLR | Embedding |
| DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models | 2023 | ACL | Masked |
| Latent Diffusion for Language Generation | 2023 | NeurIPS | Latent |
| Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution | 2024 | ICML | Masked |
| SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control | 2023 | ACL | Simplex, Blockwise |
| AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation | 2023 | NeurIPS | AR-like noise |
| Likelihood-Based Diffusion Language Models | 2023 | NeurIPS | Plaid1B |
| Scaling up Masked Diffusion Models on Text | 2024 | ICLR | 1.1B |
| Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models | 2025 | ICLR |
We welcome all researchers to contribute to this repository.
If you have a related paper that was not added to the library, please contact us.
Email: jake630@snu.ac.kr / wjk9904@snu.ac.kr