Sankalp Sinha* · Mohammad Sadil Khan* · Muhammad Usama · Shino Sam · Didier Stricker · Sk Aziz Ali · Muhammad Zeshan Afzal
* Equal contribution
MARVEL-FX3D is the generation component of the MARVEL-40M+ paper, which introduces a dataset of 40M multi-level text annotations for 8.9M+ 3D assets. The dataset and annotation pipeline are described in the main paper.
MARVEL-FX3D is trained on MARVEL-40M+, the largest 3D captioning dataset to date.
| Property | Value |
|---|---|
| Total Annotations | 40 million |
| 3D Assets | 8.9 million+ |
| Source Datasets | 7 major 3D repositories |
| Annotation Levels | Detailed (150–200 words) → Tags (10–20 words) |
The multi-stage annotation pipeline combines open-source multi-view VLMs and LLMs with human metadata from source datasets, reducing hallucinations and improving domain-specific accuracy.
🔗 Dataset on Hugging Face: MARVEL-40M+
# Clone the repository
git clone https://github.com/SadilKhan/MARVEL-FX3D.git
cd MARVEL-FX3D
# Create a conda environment (recommended)
conda create -n marvel python=3.10
conda activate marvel
# Install dependencies
pip install -r requirements.txtRequirements: Python ≥ 3.10, PyTorch ≥ 2.0, CUDA ≥ 11.8 (recommended)
# Generate a 3D textured mesh from a text prompt
python generate.py --prompt "A Harley Davidson motorcycle with a black leather seat and dual exhaust pipes"If you find MARVEL-FX3D or MARVEL-40M+ useful in your research, please cite:
@InProceedings{Sinha_2025_CVPR,
author = {Sinha, Sankalp and Khan, Mohammad Sadil and Usama, Muhammad and Sam, Shino and Stricker, Didier and Ali, Sk Aziz and Afzal, Muhammad Zeshan},
title = {MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2025},
}This project is released under the MIT License.
