Official implementation of "Composite Data Augmentations for Synthetic Image Detection Against Real-World Perturbations"
Accepted at EUSIPCO 2025 (33rd European Signal Processing Conference)
This repository implements a robust deep learning framework for detecting GAN-generated synthetic images with robustness to real-world perturbations. The approach combines multiple data augmentation techniques with a dual-criteria training pipeline to enhance model robustness against common image distortions encountered in practice. It also explores the utilization of a genetic algorithm for finding the optimal augmentaions to use during training.
- Composite Data Augmentations: Multiple augmentation strategies including JPEG compression, Gaussian blur, noise, sharpening, color transformations, and more
- Dual-Criteria Training Pipeline: Novel training approach combining two losses
- Genetic Algorithm Optimization: Automated augmentation strategy selection using PyGAD
- Comprehensive Evaluation: Testing across 13 different synthetic image datasets
- ResNet-50 Backbone: Pre-trained on ImageNet with custom binary classification head
-
Clone the repository
-
Create and activate a virtual environment:
# Create virtual environment python -m venv venv # Activate virtual environment # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up your dataset directory structure:
dataset/ ├── train/ │ ├── class1/ │ └── class2/ ├── val/ │ ├── class1/ │ └── class2/ └── test/ ├── progan/ ├── stylegan/ ├── biggan/ └── ...
Organize your datasets in the following structure:
dataset/
├── train/
│ ├── class1/
| | ├──0_real
| | ├──1_fake
│ ├── class2/
| | ├──0_real
| | ├──1_fake
├── val/
│ ├── class1/
| | ├──0_real
| | ├──1_fake
│ ├── class2/
| | ├──0_real
| | ├──1_fake
└── test/
├── progan/
| ├──0_real
| ├──1_fake
├── stylegan/
├── biggan/
├── cyclegan/
├── stargan/
├── gaugan/
├── crn/
├── imle/
├── seeingdark/
├── san/
├── deepfake/
├── stylegan2/
└── whichfaceisreal/
python train.py --name experiment_name --lr 0.001 --optim adam --num_epochs 25python double_criteria_pipeline.py --name dual_experiment --lr 0.001 --optim adampython sid_ga.py --name ga_experiment --num_generations 10 --solutions_per_population 8python evaluate.py --model_path best_model --runner local # For local testing
python evaluate.py --model_path best_model --runner hpc # For full evaluation on HPC environmentsThe model outputs comprehensive evaluation metrics:
- Accuracy: Overall classification accuracy
- Average Precision (AP): Area under precision-recall curve
- Real Accuracy: Accuracy on real images
- Fake Accuracy: Accuracy on synthetic images
Results are automatically logged to TensorBoard and saved as formatted tables.