[Great News] 🎉🎉🎉 Our paper has been accpected by WWW'25 Resource Track
This is the official repo for paper: RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection
The dataset is publicly avaliable at zenodo and HuggingFace:
https://zenodo.org/records/11406538https://huggingface.co/datasets/zzha6204/RU-AI-originThe noise augmented dataset is publicly avaliable at Huggingface:
https://huggingface.co/datasets/zzha6204/RU-AI-noise| Dataset | Modality | Content | Real/Human | Machine Generated Content | Task |
|---|---|---|---|---|---|
| M4 | Text | General | 10,019,311 | 122,481 | Multi-lingual AI Text Detection |
| DeepfakeTextDetect | Text | General | 447,674 | 447,674 | Generalised AI Text Detection |
| ArguGPT | Text | Essay | 4,115 | 4,038 | Language Learner-AI Text Detection |
| HC3 | Text | Question Answers | 80,805 | 44,425 | AI Answer Detection |
| CNNSpot | Image | General | 362,000 | 362,000 | AI Image Detection |
| DE-FAKE | Image | General | 20,000 | 191,946 | AI Image Detection |
| GenImage | Image | General | 1,331,167 | 1,350,000 | AI Image Detection |
| WaveFake | Voice | General | 13,600 | 104,885 | Fake Voice Detection |
| Sprocket-VC | Voice | General | 3,132 | 3,456 | Fake Voice Detection |
| FakeAVCeleb | Video-Voice | Face | 500 | 19,500 | DeepFake Detection |
| ForgeryNet | Video-Image | Face | 1,438,201 | 1,457,861 | DeepFake Detection |
| DFDC | Video-Image-Voice | Face | 23,654 | 104,500 | DeepFake Detection |
| DGM4 | Text-Image | General | 77,426 | 152,574 | Media Manipulation Detection |
| Ours | Text-Image-Voice | General | 245,895 | 1,229,475 | AI Text Image Voice Detection |
The dataset requires at least 500GB of disk space to be fully downloaded.
The model inference requires a Nvidia GPU with at least 16GB of vRAM to run. We recommend to have NVIDIA RTX 3090, 24GB or anything above to run this project.
We highly recommend to have this package installed within a virtual environment such as conda or venv.
Environmental requirement:
- Python >= 3.8
- Pytorch >= 1.13.1
- CUDA Version >= 11.6
Clone the project:
git clone https://github.com/ZhihaoZhang97/RU-AI.gitCreate the virtual environment via conda and Python 3.8:
conda create -n ruai python=3.8Activate the environment:
conda activate ruaiMove into the project path:
cd RU-AIInstall the dependencies:
pip3 install -r requirements.txtWe provide a quick tutorial on how to download and inspect the dataset on the data-example.ipynb notebook.
You can also directly run the follwoing code to download smaple data sourced for flickr8k:
python ./download_flickr.pyYou can also download all the data by running the following code.
Please note the whole dataset is over 157GB in compression and could take up to 500GB after decompression.
It will take a while for downloading, the actual speed depends on your internet.
python ./download_all.pyYou can also go to ./data to manually check the data after downloading.
Here is the directory tree after downloading all the data:
├── audio
│ ├── coco
│ │ ├── efficientspeech
│ │ ├── real
│ │ ├── styletts2
│ │ ├── vits
│ │ ├── xtts2
│ │ └── yourtts
│ ├── flickr8k
│ │ ├── efficientspeech
│ │ ├── real
│ │ ├── styletts2
│ │ ├── vits
│ │ ├── xtts2
│ │ └── yourtts
│ └── place
│ ├── efficientspeech
│ ├── real
│ ├── styletts2
│ ├── vits
│ ├── xtts2
│ └── yourtts
├── image
│ ├── coco
│ │ ├── real
│ │ ├── stable-diffusion-images-absolutereality-remove-black
│ │ ├── stable-diffusion-images-epicrealism-remove-black
│ │ ├── stable-diffusion-images-v1-5
│ │ ├── stable-diffusion-images-v6-0-remove-black
│ │ └── stable-diffusion-images-xl-v3-0-remove-black
│ ├── flickr8k
│ │ ├── real
│ │ ├── stable-diffusion-images-absolutereality
│ │ ├── stable-diffusion-images-epicrealism
│ │ ├── stable-diffusion-images-v1-5
│ │ ├── stable-diffusion-images-v6-0
│ │ └── stable-diffusion-images-xl-v3-0
│ └── place
│ ├── real
│ ├── stable-diffusion-images-absolutereality-remove-black
│ ├── stable-diffusion-images-epicrealism-remove-black
│ ├── stable-diffusion-images-v1-5
│ ├── stable-diffusion-images-v6-0-remove-black
│ └── stable-diffusion-images-xl-v3-0-remove-black
└── text
├── coco
├── flickr8k
└── place
Before model inference, replace image_data_paths, audio_data_paths, text_data in the infer_imagebind_model.py and infer_languagebind_model.py files with real data / data paths
imagebind based model
python infer_imagebind_model.pylanguagebind based model
python infer_languagebind_model.pyWe are appreciated the open-source community for the datasets and the models.
Microsoft COCO: Common Objects in Context
Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics
Learning Deep Features for Scene Recognition using Places Database
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Unsupervised Learning of Spoken Language with Visual Context
Learning Word-Like Units from Joint Audio-Visual Analysis
ImageBind: One Embedding Space To Bind Them All
If found our dataset or research useful, please cite:
@misc{huang2024ruai,
title={RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection},
author={Liting Huang and Zhihao Zhang and Yiran Zhang and Xiyue Zhou and Shoujin Wang},
year={2024},
eprint={2406.04906},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

