██████╗ ███████╗███████╗██████╗ ███████╗ █████╗ ██╗ ██╗███████╗
██╔══██╗██╔════╝██╔════╝██╔══██╗██╔════╝██╔══██╗██║ ██╔╝██╔════╝
██║ ██║█████╗ █████╗ ██████╔╝█████╗ ███████║█████╔╝ █████╗
██║ ██║██╔══╝ ██╔══╝ ██╔═══╝ ██╔══╝ ██╔══██║██╔═██╗ ██╔══╝
██████╔╝███████╗███████╗██║ ██║ ██║ ██║██║ ██╗███████╗
╚═════╝ ╚══════╝╚══════╝╚═╝ ╚═╝ ╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝
██████╗ ██████╗ █████╗ ███╗ ██╗███╗ ██╗███████╗██████╗
██╔════╝ ██╔════╝██╔══██╗████╗ ██║████╗ ██║██╔════╝██╔══██╗
╚█████╗ ██║ ███████║██╔██╗ ██║██╔██╗██║█████╗ ██████╔╝
╚═══██╗ ██║ ██╔══██║██║╚██╗██║██║╚████║██╔══╝ ██╔══██╗
██████╔╝ ╚██████╗██║ ██║██║ ╚████║██║ ╚███║███████╗██║ ██║
╚═════╝ ╚═════╝╚═╝ ╚═╝╚═╝ ╚═══╝╚═╝ ╚══╝╚══════╝╚═╝ ╚═╝
"Is That Face Real? Is That Voice Cloned? Our Neural Truth Engine Knows."
| 🖼️ Face-Swap Detection | 🤖 GAN/AI Image Detection | 🎵 Voice Clone Detection | ⚡ Real-Time | 🌐 Public API |
|---|---|---|---|---|
| EfficientNet-B0 | EfficientNet-B0 | AudioMLP + Spectral | < 200ms | FastAPI REST |
| 95.8% Accuracy | 98.1% AUC-ROC | 99.6% Accuracy | MPS / CPU | JSON Response |
| FaceForensics++ | CIFAKE Dataset | WaveFake Dataset | Dual-Model Fusion | Swagger UI |
╔══════════════════════════════════════════════════════════════════════════════╗
║ 🔬 DEEPFAKE SCANNER — SYSTEM ARCHITECTURE ║
╠══════════════════════════════════════════════════════════════════════════════╣
║ ║
║ ┌─────────────┐ HTTPS ┌──────────────────────────────────────┐ ║
║ │ 🌐 USER │ ─────────────► │ STREAMLIT FRONTEND ║ ║
║ │ BROWSER │ │ ║ ║
║ └─────────────┘ │ ┌────────────┐ ┌────────────────┐ ║ ║
║ │ │ File │ │ Result │ ║ ║
║ ┌─────────────┐ │ │ Upload │ │ Dashboard │ ║ ║
║ │ 📱 MOBILE │ ─────────────► │ │ Handler │ │ Neon UI │ ║ ║
║ │ │ │ └────────────┘ └────────────────┘ ║ ║
║ └─────────────┘ └──────────────┬───────────────────────┘ ║
║ │ HTTP POST /detect ║
║ ┌──────────────────────────────────▼──────────────────────┐ ║
║ │ FastAPI Backend :8000 ║ ║
║ │ POST /detect POST /detect/image POST /detect/audio ║ ║
║ │ POST /detect/video GET /health GET /docs ║ ║
║ └───────┬───────────────────┬────────────────┬────────────┘ ║
║ │ │ │ ║
║ ┌───────▼───────┐ ┌────────▼──────┐ ┌─────▼──────────┐ ║
║ │ 🧠 FACE-SWAP │ │ 🤖 GAN/AI │ │ 🔊 AUDIO MLP ║ ║
║ │ EfficientNet │ │ EfficientNet │ │ 26 Spectral ║ ║
║ │ B0 · 95.8% │ │ B0 · 98.1%AUC │ │ Features MLP ║ ║
║ │ 140k Faces │ │ CIFAKE 60k │ │ WaveFake 11k ║ ║
║ └───────┬───────┘ └────────┬──────┘ └─────┬──────────┘ ║
║ │ │ │ ║
║ ┌────────────────────────────────────────────────────────┐ ║
║ │ 🔀 FUSION LAYER ║ ║
║ │ Images: max(fs,gan) if >0.8 else 0.45·fs + 0.55·gan ║ ║
║ │ Video: mean(12 frames) × 0.7 + audio × 0.3 ║ ║
║ └──────────────────────────────────────────────────────┘ ║
╚══════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════╗
║ ML PIPELINE — END TO END ║
╠══════════════════════════════════════════════════════════════════════════════╣
║ ║
║ RAW DATA PREPROCESSING MODEL OUTPUT ║
║ ───────── ─────────────── ────────────── ────── ║
║ 140k Faces ────────► Resize 224×224 ─────► EfficientNet-B0 ──► .pt ║
║ Real+Fake Normalize+Augment Face-Swap Detector ↓ ║
║ FastAPI ║
║ CIFAKE ────────► Resize 224×224 ─────► EfficientNet-B0 ──► /detect ║
║ 60k Images Normalize+Augment GAN/AI Detector ║
║ ║
║ WaveFake ────────► MFCC (20 coeff) ────► AudioMLP ──► .pt ║
║ 11k clips + 6 Spectral Feat. 26-dim input + .pkl ║
║ StandardScaler (scaler) ║
╚══════════════════════════════════════════════════════════════════════════════╝
FACE-SWAP DETECTOR GAN/AI IMAGE DETECTOR
────────────────── ─────────────────────
Accuracy ███████████████████░ 95.8% Accuracy █████████████████░░░ 93.5%
ROC-AUC ████████████████████ 99.3% ROC-AUC ███████████████████░ 98.1%
Dataset ▓▓▓▓▓ 140k Faces Dataset ▓▓▓▓▓▓ CIFAKE 60k
VOICE CLONE DETECTOR
────────────────────
Accuracy ████████████████████ 99.6%
ROC-AUC ████████████████████ 99.99%
Dataset ▓▓▓▓ WaveFake 11,778 clips
| Metric | 🖼️ Face-Swap | 🤖 GAN Detector | 🎵 Voice Clone |
|---|---|---|---|
| Accuracy | 95.8% |
93.5% |
99.6% |
| ROC-AUC | 0.9926 |
0.9810 |
0.9999 |
| Algorithm | EfficientNet-B0 | EfficientNet-B0 | AudioMLP |
| Training Samples | 20,000 | 20,000 | 11,778 |
| Inference Time | < 150ms |
< 150ms |
< 50ms |
USER UPLOADS FILE
│
┌──────────┴──────────┐
│ │
▼ ▼
IMAGE / VIDEO AUDIO FILE
│ │
▼ ▼
┌────────────────────┐ ┌──────────────────────┐
│ Face-Swap Model │ │ Feature Extraction │
│ EfficientNet-B0 │ │ MFCC × 20 │
│ → faceswap_score │ │ + 6 Spectral Feats │
└────────┬───────────┘ └──────────┬───────────┘
│ │
▼ ▼
┌────────────────────┐ ┌──────────────────────┐
│ GAN Detector │ │ AudioMLP Classifier │
│ EfficientNet-B0 │ │ StandardScaler │
│ → gan_score │ │ → fake_probability │
└────────┬───────────┘ └──────────┬───────────┘
│ │
▼ │
┌────────────────────┐ │
│ Dual Model Fusion │ │
│ max(fs,gn)>0.8 ? │ │
│ else weighted avg │ │
└────────┬───────────┘ │
└──────────┬───────────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
prob < 0.5 0.5–0.65 prob > 0.65
✅ REAL ⚠️ SUSPICIOUS 🚨 FAKE
│
Confidence: HIGH / MEDIUM / LOW
🖼️ Face-Swap Deepfake Detection
Detects face replacement in images and video frames using a fine-tuned EfficientNet-B0.
Training Data: 140,000 Real & Fake Face images (Kaggle)
| Deepfake Type | Description |
|---|---|
| Face2Face | Facial expression transfer |
| FaceSwap | Identity swap between subjects |
| Deepfakes | Neural face replacement |
| NeuralTextures | Texture-based manipulation |
Looks for blending artifacts, lighting inconsistencies, and unnatural facial boundaries at pixel level.
🤖 GAN / AI-Generated Image Detection
Detects fully AI-generated faces from diffusion models and GANs.
Training Data: CIFAKE dataset — 60,000 real photos vs Stable Diffusion generated images
| Generator | Type |
|---|---|
| StyleGAN2/3 | thispersondoesnotexist.com |
| Stable Diffusion | AI art generators |
| DALL-E | OpenAI image generation |
| Midjourney | AI image synthesis |
| Gemini Imagen | Google AI images |
Learns frequency-domain artifacts and texture patterns unique to neural synthesis.
🎵 Voice Clone / TTS Detection
Detects synthesised and cloned voices using spectral feature analysis.
Training Data: WaveFake dataset — 11,778 real & AI voice samples
Input Features (26-dim vector):
| Feature | Description |
|---|---|
chroma_stft |
Chromagram energy |
rms |
Root mean square energy |
spectral_centroid |
Frequency centre of mass |
spectral_bandwidth |
Frequency spread |
rolloff |
High-frequency roll-off |
zero_crossing_rate |
Signal sign changes |
mfcc_1–20 |
Mel-frequency cepstral coefficients |
🔀 Dual-Model Fusion Engine
For image inputs, both the Face-Swap Detector and GAN Detector run simultaneously:
# If either model is very confident → trust it
if gan_score > 0.8 or faceswap_score > 0.8:
final = max(faceswap_score, gan_score)
else:
# Weighted average (GAN gets slightly more weight)
final = faceswap_score * 0.45 + gan_score * 0.55Catches both face-swapped videos AND fully AI-generated faces in one pass.
For video: final = 0.70 × video_score + 0.30 × audio_score
PUBLIC ENDPOINTS
────────────────
GET /health System health + loaded models status
POST /detect Auto-detect file type → run appropriate model
POST /detect/image Image-only deepfake detection
POST /detect/audio Audio-only voice clone detection
POST /detect/video Video deepfake + optional audio fusion
GET /docs Swagger UI (interactive API explorer)
Request:
curl -X POST http://localhost:8000/detect \
-F "file=@your_image.jpg"Response:
{
"verdict": "FAKE",
"confidence": "HIGH",
"fake_probability": 0.9731,
"real_probability": 0.0269,
"modality": "image",
"latency_ms": 143.2,
"detail": {
"faceswap_score": 0.1823,
"gan_score": 0.9731,
"fusion": "dual_model"
}
}DeepFake-Detector/
│
├── 📄 api_server.py ← FastAPI backend (python3 api_server.py)
├── 📄 start.py ← Render / HF start script
├── 📋 requirements.txt ← Full dependencies
│
├── 📂 src/
│ ├── models/
│ │ ├── image_model.py ← EfficientNet-B0 face-swap detector
│ │ ├── video_model.py ← EfficientNet + BiLSTM video detector
│ │ └── audio_model.py ← AudioMLP (26-dim spectral features)
│ ├── train/
│ │ ├── train_image.py ← Image model training pipeline
│ │ ├── train_video.py ← Video model training pipeline
│ │ ├── train_audio.py ← Audio model training pipeline
│ │ └── train_audio_wavefake.py ← WaveFake-specific trainer
│ ├── preprocessing/
│ │ ├── extract_frames.py ← Video → frames → face crops
│ │ └── audio_features.py ← Audio → MFCC/mel/LFCC features
│ ├── inference/
│ │ └── detector.py ← Unified inference engine
│ ├── fusion/
│ │ └── ensemble.py ← Weighted avg / voting / meta-clf
│ └── utils/
│ └── helpers.py ← Seeds, metrics, checkpoints
│
├── 📂 models/ ← Trained weights (not in repo → download below)
│ ├── image_model/
│ │ └── best_model.pt ← Face-swap detector weights
│ ├── gan_detector/
│ │ └── best_model.pt ← GAN/AI image detector weights
│ └── audio_model/
│ ├── best_model.pt ← Voice clone detector weights
│ └── feature_scaler.pkl ← StandardScaler for audio features
│
├── 📂 ui/
│ └── app.py ← Neon glassmorphism Streamlit UI
│
├── 📂 configs/
│ └── config.yaml ← All hyperparameters & paths
│
├── 📂 scripts/
│ ├── download_datasets.py ← Kaggle dataset downloader
│ ├── train_all.py ← Train all 3 models sequentially
│ └── docker_start.sh ← Docker entrypoint
│
├── 📄 Dockerfile
├── 📄 docker-compose.yml
└── 📋 requirements.txt
python --version # Python 3.11+
brew install ffmpeg # macOS — required for video audio extraction
# sudo apt install ffmpeg # Ubuntu/Linuxgit clone https://github.com/BhavyaKansal20/DeepFake-Detector.git
cd DeepFake-Detectorpython3 -m venv venv
source venv/bin/activate # Linux/macOS
# venv\Scripts\activate # Windows# PyTorch (Apple Silicon MPS)
pip install torch torchvision torchaudio
# Remaining dependencies
pip install -r requirements.txt# Set up Kaggle API key at kaggle.com/settings → API → Create New Token
mkdir -p ~/.kaggle
# paste token into ~/.kaggle/kaggle.json
# Download datasets
kaggle datasets download -d xhlulu/140k-real-and-fake-faces -p data/raw/images/ --unzip
kaggle datasets download -d birdy654/cifake-real-and-ai-generated-synthetic-images -p data/raw/images/cifake/ --unzip
kaggle datasets download -d birdy654/deep-voice-deepfake-voice-recognition -p data/raw/audio/ --unzip
# Train all models (~3 hours on Apple M4 / ~2 hours on A100)
python scripts/train_all.py# Terminal 1 — Start API backend
export PYTHONPATH=$(pwd)
python3 api_server.py
# → Running at http://localhost:8000
# → Swagger UI at http://localhost:8000/docs
# Terminal 2 — Start Streamlit UI
streamlit run app.py
# → Running at http://localhost:8501The fastest way to get this live publicly for free:
- Go to huggingface.co/new-space
- SDK: Streamlit
- Visibility: Public
Upload your models/ folder to the Space files (or use Git LFS):
git lfs install
git lfs track "*.pt" "*.pkl"
git add .gitattributes
git add models/
git commit -m "Add model weights"
git pushgit remote add space https://huggingface.co/spaces/YOUR_USERNAME/deepfake-scanner
git push space mainYour Space will auto-deploy. The app.py at root starts the FastAPI backend in a background thread and serves the Streamlit UI as the main entry point.
Note: HF Free tier has limited RAM (~16GB). The image + audio models load fine. For video processing, increase hardware if needed.
# Build
docker build -t deepfake-detector .
# Run (GPU)
docker run --gpus all -p 8000:8000 -p 8501:8501 \
-v $(pwd)/models:/app/models deepfake-detector
# Or with docker-compose
docker-compose up| Layer | Technology | Purpose |
|---|---|---|
| Language | Python 3.11 | Core runtime |
| Deep Learning | PyTorch 2.2 | Model training & inference |
| CV Backbone | timm (EfficientNet-B0) | Pretrained ImageNet weights |
| Audio | librosa | MFCC & spectral feature extraction |
| Backend | FastAPI 0.109 | REST API, file upload, Swagger UI |
| Frontend | Streamlit 1.55 | Neon glassmorphism dashboard |
| Acceleration | Apple MPS / CUDA | GPU-accelerated inference |
| Containerization | Docker + docker-compose | Reproducible deployment |
| Deployment | Render · Hugging Face Spaces | Cloud hosting |
| Dataset | Modality | Size | Source |
|---|---|---|---|
| 140k Real & Fake Faces | Image | 140,000 images | Kaggle |
| CIFAKE | Image | 60,000 images | Kaggle |
| WaveFake | Audio | 11,778 clips | Kaggle |
| FaceForensics++ | Video | 1,000+ videos | Request Access |
| DFDC (Facebook) | Video | 100,000+ videos | Kaggle Competition |
| ASVspoof 2019 | Audio | 121,461 clips | Kaggle Mirror |
IMAGE MODELS
─────────────
Face-Swap Detector → Works best on: FaceSwap, Face2Face, Deepfake videos
May miss: fully AI-generated images (GAN detector handles those)
GAN Detector → Works best on: StyleGAN, DALL-E, Midjourney, Gemini
May miss: novel, unseen generator architectures
AUDIO MODEL
────────────
Voice Detector → Works best on: ElevenLabs, Murf, standard TTS systems
May miss: very high-quality unseen voice clones
GENERAL NOTE
─────────────
All models may perform differently on out-of-distribution samples.
Results are probabilistic — use as supporting evidence, not sole proof.
| Choice | Reason |
|---|---|
| EfficientNet-B0 over ResNet | Better accuracy/compute trade-off; stronger GAN artifact detection |
| AudioMLP over wav2vec2 | Lightweight 26-dim features; no heavy transformer required; 99.6% accuracy |
| Dual-model fusion for images | Catches both face-swapped AND fully AI-generated in one API call |
| 12-frame video sampling | Fast inference without sacrificing accuracy across temporal dimension |
| Focal Loss for images | Class imbalance common in real datasets; down-weights easy negatives |
| Mixed precision (AMP) | 2× speedup on modern GPUs with no accuracy loss |
| StandardScaler for audio | Essential — raw spectral features span vastly different value ranges |
╔══════════════════════════════════════════════════════════════╗
║ ║
║ 👤 Bhavya Kansal ║
║ 🎓 AI Engineer · DeepTech Developer ║
║ 🏢 Founder & CEO — Multimodex AI ║
║ 🎓 Diploma CSE → B.Tech AI/ML (TIET, Patiala) ║
║ 🔬 AI/ML Industrial Trainee — NIELIT Ropar × IIT Ropar ║
║ 🌐 bhavyakansal.dev ║
║ 📧 kansalbhavya27@gmail.com ║
║ ║
╚══════════════════════════════════════════════════════════════╝
If DeepFake Scanner helped you or impressed you:
1. ⭐ Star this repository
2. 🍴 Fork and build on it
3. 📣 Share with your network
4. 🐛 Open issues / PRs
5. 🔗 Share the live demo link
Every star helps this project reach more developers and researchers. 🙏
╔══════════════════════════════════════════════════════╗
║ 🔬 D E E P F A K E S C A N N E R ║
║ Multimodex AI · © 2026 Bhavya Kansal ║
║ Built with ❤️ for a safer digital world ║
╚══════════════════════════════════════════════════════╝
"Is That Face Real? Is That Voice Cloned? Our Neural Truth Engine Knows."
| 🖼️ Face-Swap Detection | 🤖 GAN/AI Image Detection | 🎵 Voice Clone Detection | ⚡ Real-Time | 🌐 Public API |
|---|---|---|---|---|
| EfficientNet-B0 | EfficientNet-B0 | AudioMLP + Spectral | < 200ms | FastAPI REST |
| 95.8% Accuracy | 98.1% AUC-ROC | 99.6% Accuracy | MPS / CPU | JSON Response |
| FaceForensics++ | CIFAKE Dataset | WaveFake Dataset | Dual-Model Fusion | Swagger UI |
╔══════════════════════════════════════════════════════════════════════════════╗
║ 🔬 DEEPFAKE SCANNER — SYSTEM ARCHITECTURE ║
╠══════════════════════════════════════════════════════════════════════════════╣
║ ║
║ ┌─────────────┐ HTTPS ┌──────────────────────────────────────┐ ║
║ │ 🌐 USER │ ─────────────► │ STREAMLIT FRONTEND ║ ║
║ │ BROWSER │ │ ║ ║
║ └─────────────┘ │ ┌────────────┐ ┌────────────────┐ ║ ║
║ │ │ File │ │ Result │ ║ ║
║ ┌─────────────┐ │ │ Upload │ │ Dashboard │ ║ ║
║ │ 📱 MOBILE │ ─────────────► │ │ Handler │ │ Neon UI │ ║ ║
║ │ │ │ └────────────┘ └────────────────┘ ║ ║
║ └─────────────┘ └──────────────┬───────────────────────┘ ║
║ │ HTTP POST /detect ║
║ ┌──────────────────────────────────▼──────────────────────┐ ║
║ │ FastAPI Backend :8000 ║ ║
║ │ POST /detect POST /detect/image POST /detect/audio ║ ║
║ │ POST /detect/video GET /health GET /docs ║ ║
║ └───────┬───────────────────┬────────────────┬────────────┘ ║
║ │ │ │ ║
║ ┌───────▼───────┐ ┌────────▼──────┐ ┌─────▼──────────┐ ║
║ │ 🧠 FACE-SWAP │ │ 🤖 GAN/AI │ │ 🔊 AUDIO MLP ║ ║
║ │ EfficientNet │ │ EfficientNet │ │ 26 Spectral ║ ║
║ │ B0 · 95.8% │ │ B0 · 98.1%AUC │ │ Features MLP ║ ║
║ │ 140k Faces │ │ CIFAKE 60k │ │ WaveFake 11k ║ ║
║ └───────┬───────┘ └────────┬──────┘ └─────┬──────────┘ ║
║ │ │ │ ║
║ ┌────────────────────────────────────────────────────────┐ ║
║ │ 🔀 FUSION LAYER ║ ║
║ │ Images: max(fs,gan) if >0.8 else 0.45·fs + 0.55·gan ║ ║
║ │ Video: mean(12 frames) × 0.7 + audio × 0.3 ║ ║
║ └──────────────────────────────────────────────────────┘ ║
╚══════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════╗
║ ML PIPELINE — END TO END ║
╠══════════════════════════════════════════════════════════════════════════════╣
║ ║
║ RAW DATA PREPROCESSING MODEL OUTPUT ║
║ ───────── ─────────────── ────────────── ────── ║
║ 140k Faces ────────► Resize 224×224 ─────► EfficientNet-B0 ──► .pt ║
║ Real+Fake Normalize+Augment Face-Swap Detector ↓ ║
║ FastAPI ║
║ CIFAKE ────────► Resize 224×224 ─────► EfficientNet-B0 ──► /detect ║
║ 60k Images Normalize+Augment GAN/AI Detector ║
║ ║
║ WaveFake ────────► MFCC (20 coeff) ────► AudioMLP ──► .pt ║
║ 11k clips + 6 Spectral Feat. 26-dim input + .pkl ║
║ StandardScaler (scaler) ║
╚══════════════════════════════════════════════════════════════════════════════╝
FACE-SWAP DETECTOR GAN/AI IMAGE DETECTOR
────────────────── ─────────────────────
Accuracy ███████████████████░ 95.8% Accuracy █████████████████░░░ 93.5%
ROC-AUC ████████████████████ 99.3% ROC-AUC ███████████████████░ 98.1%
Dataset ▓▓▓▓▓ 140k Faces Dataset ▓▓▓▓▓▓ CIFAKE 60k
VOICE CLONE DETECTOR
────────────────────
Accuracy ████████████████████ 99.6%
ROC-AUC ████████████████████ 99.99%
Dataset ▓▓▓▓ WaveFake 11,778 clips
| Metric | 🖼️ Face-Swap | 🤖 GAN Detector | 🎵 Voice Clone |
|---|---|---|---|
| Accuracy | 95.8% |
93.5% |
99.6% |
| ROC-AUC | 0.9926 |
0.9810 |
0.9999 |
| Algorithm | EfficientNet-B0 | EfficientNet-B0 | AudioMLP |
| Training Samples | 20,000 | 20,000 | 11,778 |
| Inference Time | < 150ms |
< 150ms |
< 50ms |
USER UPLOADS FILE
│
┌──────────┴──────────┐
│ │
▼ ▼
IMAGE / VIDEO AUDIO FILE
│ │
▼ ▼
┌────────────────────┐ ┌──────────────────────┐
│ Face-Swap Model │ │ Feature Extraction │
│ EfficientNet-B0 │ │ MFCC × 20 │
│ → faceswap_score │ │ + 6 Spectral Feats │
└────────┬───────────┘ └──────────┬───────────┘
│ │
▼ ▼
┌────────────────────┐ ┌──────────────────────┐
│ GAN Detector │ │ AudioMLP Classifier │
│ EfficientNet-B0 │ │ StandardScaler │
│ → gan_score │ │ → fake_probability │
└────────┬───────────┘ └──────────┬───────────┘
│ │
▼ │
┌────────────────────┐ │
│ Dual Model Fusion │ │
│ max(fs,gn)>0.8 ? │ │
│ else weighted avg │ │
└────────┬───────────┘ │
└──────────┬───────────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
prob < 0.5 0.5–0.65 prob > 0.65
✅ REAL ⚠️ SUSPICIOUS 🚨 FAKE
│
Confidence: HIGH / MEDIUM / LOW
🖼️ Face-Swap Deepfake Detection
Detects face replacement in images and video frames using a fine-tuned EfficientNet-B0.
Training Data: 140,000 Real & Fake Face images (Kaggle)
| Deepfake Type | Description |
|---|---|
| Face2Face | Facial expression transfer |
| FaceSwap | Identity swap between subjects |
| Deepfakes | Neural face replacement |
| NeuralTextures | Texture-based manipulation |
Looks for blending artifacts, lighting inconsistencies, and unnatural facial boundaries at pixel level.
🤖 GAN / AI-Generated Image Detection
Detects fully AI-generated faces from diffusion models and GANs.
Training Data: CIFAKE dataset — 60,000 real photos vs Stable Diffusion generated images
| Generator | Type |
|---|---|
| StyleGAN2/3 | thispersondoesnotexist.com |
| Stable Diffusion | AI art generators |
| DALL-E | OpenAI image generation |
| Midjourney | AI image synthesis |
| Gemini Imagen | Google AI images |
Learns frequency-domain artifacts and texture patterns unique to neural synthesis.
🎵 Voice Clone / TTS Detection
Detects synthesised and cloned voices using spectral feature analysis.
Training Data: WaveFake dataset — 11,778 real & AI voice samples
Input Features (26-dim vector):
| Feature | Description |
|---|---|
chroma_stft |
Chromagram energy |
rms |
Root mean square energy |
spectral_centroid |
Frequency centre of mass |
spectral_bandwidth |
Frequency spread |
rolloff |
High-frequency roll-off |
zero_crossing_rate |
Signal sign changes |
mfcc_1–20 |
Mel-frequency cepstral coefficients |
🔀 Dual-Model Fusion Engine
For image inputs, both the Face-Swap Detector and GAN Detector run simultaneously:
# If either model is very confident → trust it
if gan_score > 0.8 or faceswap_score > 0.8:
final = max(faceswap_score, gan_score)
else:
# Weighted average (GAN gets slightly more weight)
final = faceswap_score * 0.45 + gan_score * 0.55Catches both face-swapped videos AND fully AI-generated faces in one pass.
For video: final = 0.70 × video_score + 0.30 × audio_score
PUBLIC ENDPOINTS
────────────────
GET /health System health + loaded models status
POST /detect Auto-detect file type → run appropriate model
POST /detect/image Image-only deepfake detection
POST /detect/audio Audio-only voice clone detection
POST /detect/video Video deepfake + optional audio fusion
GET /docs Swagger UI (interactive API explorer)
Request:
curl -X POST http://localhost:8000/detect \
-F "file=@your_image.jpg"Response:
{
"verdict": "FAKE",
"confidence": "HIGH",
"fake_probability": 0.9731,
"real_probability": 0.0269,
"modality": "image",
"latency_ms": 143.2,
"detail": {
"faceswap_score": 0.1823,
"gan_score": 0.9731,
"fusion": "dual_model"
}
}DeepFake-Detector/
│
├── 📄 api_server.py ← FastAPI backend (python3 api_server.py)
├── 📄 start.py ← Render / HF start script
├── 📋 requirements.txt ← Full dependencies
│
├── 📂 src/
│ ├── models/
│ │ ├── image_model.py ← EfficientNet-B0 face-swap detector
│ │ ├── video_model.py ← EfficientNet + BiLSTM video detector
│ │ └── audio_model.py ← AudioMLP (26-dim spectral features)
│ ├── train/
│ │ ├── train_image.py ← Image model training pipeline
│ │ ├── train_video.py ← Video model training pipeline
│ │ ├── train_audio.py ← Audio model training pipeline
│ │ └── train_audio_wavefake.py ← WaveFake-specific trainer
│ ├── preprocessing/
│ │ ├── extract_frames.py ← Video → frames → face crops
│ │ └── audio_features.py ← Audio → MFCC/mel/LFCC features
│ ├── inference/
│ │ └── detector.py ← Unified inference engine
│ ├── fusion/
│ │ └── ensemble.py ← Weighted avg / voting / meta-clf
│ └── utils/
│ └── helpers.py ← Seeds, metrics, checkpoints
│
├── 📂 models/ ← Trained weights (not in repo → download below)
│ ├── image_model/
│ │ └── best_model.pt ← Face-swap detector weights
│ ├── gan_detector/
│ │ └── best_model.pt ← GAN/AI image detector weights
│ └── audio_model/
│ ├── best_model.pt ← Voice clone detector weights
│ └── feature_scaler.pkl ← StandardScaler for audio features
│
├── 📂 ui/
│ └── app.py ← Neon glassmorphism Streamlit UI
│
├── 📂 configs/
│ └── config.yaml ← All hyperparameters & paths
│
├── 📂 scripts/
│ ├── download_datasets.py ← Kaggle dataset downloader
│ ├── train_all.py ← Train all 3 models sequentially
│ └── docker_start.sh ← Docker entrypoint
│
├── 📄 Dockerfile
├── 📄 docker-compose.yml
└── 📋 requirements.txt
python --version # Python 3.11+
brew install ffmpeg # macOS — required for video audio extraction
# sudo apt install ffmpeg # Ubuntu/Linuxgit clone https://github.com/BhavyaKansal20/DeepFake-Detector.git
cd DeepFake-Detectorpython3 -m venv venv
source venv/bin/activate # Linux/macOS
# venv\Scripts\activate # Windows# PyTorch (Apple Silicon MPS)
pip install torch torchvision torchaudio
# Remaining dependencies
pip install -r requirements.txt# Set up Kaggle API key at kaggle.com/settings → API → Create New Token
mkdir -p ~/.kaggle
# paste token into ~/.kaggle/kaggle.json
# Download datasets
kaggle datasets download -d xhlulu/140k-real-and-fake-faces -p data/raw/images/ --unzip
kaggle datasets download -d birdy654/cifake-real-and-ai-generated-synthetic-images -p data/raw/images/cifake/ --unzip
kaggle datasets download -d birdy654/deep-voice-deepfake-voice-recognition -p data/raw/audio/ --unzip
# Train all models (~3 hours on Apple M4 / ~2 hours on A100)
python scripts/train_all.py# Terminal 1 — Start API backend
export PYTHONPATH=$(pwd)
python3 api_server.py
# → Running at http://localhost:8000
# → Swagger UI at http://localhost:8000/docs
# Terminal 2 — Start Streamlit UI
streamlit run app.py
# → Running at http://localhost:8501The fastest way to get this live publicly for free:
- Go to huggingface.co/new-space
- SDK: Streamlit
- Visibility: Public
Upload your models/ folder to the Space files (or use Git LFS):
git lfs install
git lfs track "*.pt" "*.pkl"
git add .gitattributes
git add models/
git commit -m "Add model weights"
git pushgit remote add space https://huggingface.co/spaces/YOUR_USERNAME/deepfake-scanner
git push space mainYour Space will auto-deploy. The app.py at root starts the FastAPI backend in a background thread and serves the Streamlit UI as the main entry point.
Note: HF Free tier has limited RAM (~16GB). The image + audio models load fine. For video processing, increase hardware if needed.
# Build
docker build -t deepfake-detector .
# Run (GPU)
docker run --gpus all -p 8000:8000 -p 8501:8501 \
-v $(pwd)/models:/app/models deepfake-detector
# Or with docker-compose
docker-compose up| Layer | Technology | Purpose |
|---|---|---|
| Language | Python 3.11 | Core runtime |
| Deep Learning | PyTorch 2.2 | Model training & inference |
| CV Backbone | timm (EfficientNet-B0) | Pretrained ImageNet weights |
| Audio | librosa | MFCC & spectral feature extraction |
| Backend | FastAPI 0.109 | REST API, file upload, Swagger UI |
| Frontend | Streamlit 1.55 | Neon glassmorphism dashboard |
| Acceleration | Apple MPS / CUDA | GPU-accelerated inference |
| Containerization | Docker + docker-compose | Reproducible deployment |
| Deployment | Render · Hugging Face Spaces | Cloud hosting |
| Dataset | Modality | Size | Source |
|---|---|---|---|
| 140k Real & Fake Faces | Image | 140,000 images | Kaggle |
| CIFAKE | Image | 60,000 images | Kaggle |
| WaveFake | Audio | 11,778 clips | Kaggle |
| FaceForensics++ | Video | 1,000+ videos | Request Access |
| DFDC (Facebook) | Video | 100,000+ videos | Kaggle Competition |
| ASVspoof 2019 | Audio | 121,461 clips | Kaggle Mirror |
IMAGE MODELS
─────────────
Face-Swap Detector → Works best on: FaceSwap, Face2Face, Deepfake videos
May miss: fully AI-generated images (GAN detector handles those)
GAN Detector → Works best on: StyleGAN, DALL-E, Midjourney, Gemini
May miss: novel, unseen generator architectures
AUDIO MODEL
────────────
Voice Detector → Works best on: ElevenLabs, Murf, standard TTS systems
May miss: very high-quality unseen voice clones
GENERAL NOTE
─────────────
All models may perform differently on out-of-distribution samples.
Results are probabilistic — use as supporting evidence, not sole proof.
| Choice | Reason |
|---|---|
| EfficientNet-B0 over ResNet | Better accuracy/compute trade-off; stronger GAN artifact detection |
| AudioMLP over wav2vec2 | Lightweight 26-dim features; no heavy transformer required; 99.6% accuracy |
| Dual-model fusion for images | Catches both face-swapped AND fully AI-generated in one API call |
| 12-frame video sampling | Fast inference without sacrificing accuracy across temporal dimension |
| Focal Loss for images | Class imbalance common in real datasets; down-weights easy negatives |
| Mixed precision (AMP) | 2× speedup on modern GPUs with no accuracy loss |
| StandardScaler for audio | Essential — raw spectral features span vastly different value ranges |
╔══════════════════════════════════════════════════════════════╗
║ ║
║ 👤 Bhavya Kansal ║
║ 🎓 AI Engineer · DeepTech Developer ║
║ 🏢 Founder & CEO — Multimodex AI ║
║ 🎓 Diploma CSE → B.Tech AI/ML (TIET, Patiala) ║
║ 🔬 AI/ML Industrial Trainee — NIELIT Ropar × IIT Ropar ║
║ 🌐 bhavyakansal.dev ║
║ 📧 kansalbhavya27@gmail.com ║
║ ║
╚══════════════════════════════════════════════════════════════╝
If DeepFake Scanner helped you or impressed you:
1. ⭐ Star this repository
2. 🍴 Fork and build on it
3. 📣 Share with your network
4. 🐛 Open issues / PRs
5. 🔗 Share the live demo link
Every star helps this project reach more developers and researchers. 🙏
╔══════════════════════════════════════════════════════╗
║ 🔬 D E E P F A K E S C A N N E R ║
║ Multimodex AI · © 2026 Bhavya Kansal ║
║ Built with ❤️ for a safer digital world ║
╚══════════════════════════════════════════════════════╝
# Clone the project
git clone <your-repo-url>
cd deepfake-detector
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# venv\Scripts\activate # Windows
# Install requirements
pip install -r requirements.txtGPU Required for training. Minimum 8GB VRAM (16GB+ recommended for video model). Google Colab Pro works great for training.
# Set up Kaggle API key first:
# kaggle.com → Settings → API → Create New Token → save to ~/.kaggle/kaggle.json
# Download all datasets
python scripts/download_datasets.py --dataset all
# Or test with sample data (no GPU needed)
python scripts/download_datasets.py --dataset sample# Extract video frames + crop faces (FaceForensics++ + DFDC)
python -m src.preprocessing.extract_frames --dataset all
# Extract audio features (ASVspoof + FakeAVCeleb)
python -m src.preprocessing.audio_features --dataset all# Train all models sequentially
python scripts/train_all.py
# Or train individually
python -m src.train.train_image --config configs/config.yaml
python -m src.train.train_video --config configs/config.yaml
python -m src.train.train_audio --config configs/config.yaml
# Skip slow video training (still get image + audio models)
python scripts/train_all.py --skip-videoExpected training times (on A100 GPU):
| Model | Time |
|---|---|
| Image (30 epochs) | ~2–3 hours |
| Video (25 epochs) | ~8–12 hours |
| Audio (20 epochs) | ~3–4 hours |
# Terminal 1 — Start API server
python -m api.main
# Terminal 2 — Start UI
streamlit run ui/app.py
# Open browser: http://localhost:8501
# API docs: http://localhost:8000/docs# Build
docker build -t deepfake-detector .
# Run (GPU)
docker run --gpus all -p 8000:8000 -p 8501:8501 \
-v $(pwd)/models:/app/models deepfake-detector
# Or with docker-compose
docker-compose upAuto-detects file type and runs the appropriate model(s).
curl -X POST http://localhost:8000/detect \
-F "file=@your_image.jpg"Response:
{
"verdict": "FAKE",
"confidence": "HIGH",
"fake_probability": 0.9732,
"real_probability": 0.0268,
"modality": "image",
"face_detected": true,
"latency_ms": 43.2
}deepfake-detector/
├── configs/
│ ├── config.yaml ← All hyperparameters & paths
│ └── nginx.conf ← Production reverse proxy
├── src/
│ ├── preprocessing/
│ │ ├── extract_frames.py ← Video → frames → face crops
│ │ └── audio_features.py ← Audio → MFCC/mel/LFCC
│ ├── models/
│ │ ├── image_model.py ← EfficientNet-B4 + Attention
│ │ ├── video_model.py ← EfficientNet-B4 + BiLSTM
│ │ └── audio_model.py ← wav2vec2 + LCNN dual-branch
│ ├── train/
│ │ ├── train_image.py ← Image training pipeline
│ │ ├── train_video.py ← Video training pipeline
│ │ └── train_audio.py ← Audio training pipeline
│ ├── inference/
│ │ └── detector.py ← Unified inference engine
│ ├── fusion/
│ │ └── ensemble.py ← Weighted avg / voting / meta-clf
│ └── utils/
│ └── helpers.py ← Seeds, metrics, checkpoints
├── api/
│ └── main.py ← FastAPI backend
├── ui/
│ └── app.py ← Streamlit dashboard
├── scripts/
│ ├── download_datasets.py
│ ├── train_all.py
│ └── docker_start.sh
├── tests/
│ └── test_pipeline.py ← Full test suite (pytest)
├── Dockerfile
├── docker-compose.yml
└── requirements.txt
pytest tests/ -v
# With coverage
pytest tests/ -v --cov=src --cov-report=htmlEdit configs/config.yaml to change:
- Model architectures (
efficientnet_b4/xception/vit_base_patch16_224) - Training hyperparameters (LR, batch size, epochs)
- Dataset paths
- Fusion weights and strategy
- API settings
| Model | Dataset | Accuracy | AUC-ROC | EER |
|---|---|---|---|---|
| Image | 140k Faces | ~99.1% | 0.999 | — |
| Video | FF++ C23 | ~97.3% | 0.991 | — |
| Audio | ASVspoof LA | ~97.8% | 0.998 | ~2.1% |
Results on held-out test sets. Generalisation to unseen generators will vary.
| Choice | Reason |
|---|---|
| EfficientNet-B4 over ResNet | Better accuracy/compute trade-off, stronger GAN artifact detection |
| BiLSTM over Transformer | Lower memory for long sequences; captures bidirectional temporal inconsistencies |
| wav2vec2 for audio | Pre-trained on 960h of real speech; excellent transfer for detecting synthesis artifacts |
| LCNN alongside wav2vec2 | Spectral features catch different forgery signatures than waveform features |
| Focal Loss for images | Class imbalance common in real datasets; Focal Loss down-weights easy negatives |
| Mixed precision (AMP) | 2× speedup on modern GPUs with no accuracy loss |
| Face crop before classification | Reduces irrelevant background; focuses model on manipulation region |
MIT License — use freely for research and commercial purposes.
- FaceForensics++ — Rössler et al.
- DFDC — Facebook AI
- ASVspoof 2019 — ASVspoof Challenge
- wav2vec 2.0 — Baevski et al., Facebook AI
- EfficientNet — Tan & Le, Google Brain