A custom CNN image classifier trained from scratch on CIFAR-10, served via a FastAPI backend with a fully custom HTML/JS frontend β no Gradio, no Streamlit. Deployed as a Docker container on HuggingFace Spaces.
Upload any image β get an instant prediction with top-3 class probabilities and confidence scores across all 10 classes.
π― ~75.1% test accuracy on CIFAR-10 using a custom 3-layer CNN built entirely in PyTorch.
| Property | Details |
|---|---|
| Dataset | CIFAR-10 |
| Total Images | 60,000 (50K train / 10K test) |
| Image Size | 32 Γ 32 Γ 3 (RGB) |
| Classes | 10 balanced classes |
| Source | torchvision.datasets.CIFAR10 |
Classes: airplane Β· automobile Β· bird Β· cat Β· deer Β· dog Β· frog Β· horse Β· ship Β· truck
Image Upload β Resize (32Γ32) β Normalize β CNN Forward Pass β Softmax β Top-3 Predictions β JSON Response
- Input image resized to 32Γ32 using
transforms.Resize - Normalized with mean
(0.5, 0.5, 0.5)and std(0.5, 0.5, 0.5)β pixel values mapped to[-1, 1] - Converted to PyTorch tensor and batched with
unsqueeze(0)
- Single forward pass through the CNN
torch.softmaxapplied to logits β probability distributiontorch.topk(probs, 3)extracts top-3 predictions
- FastAPI
/predictendpoint returns: top class, emoji, confidence %, top-3 predictions, and full probability distribution for all 10 classes
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv_layers = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1), nn.ReLU(), nn.MaxPool2d(2, 2), # 32Γ32 β 16Γ16
nn.Conv2d(32, 64, kernel_size=3, padding=1), nn.ReLU(), nn.MaxPool2d(2, 2), # 16Γ16 β 8Γ8
nn.Conv2d(64, 128, kernel_size=3, padding=1), nn.ReLU(), nn.MaxPool2d(2, 2), # 8Γ8 β 4Γ4
)
self.fc_layers = nn.Sequential(
nn.Linear(4 * 4 * 128, 256), nn.ReLU(),
nn.Linear(256, 10),
)- Feature maps: 32 β 64 β 128 filters (progressive depth)
- Spatial reduction: 32Γ32 β 16Γ16 β 8Γ8 β 4Γ4 via MaxPooling
- FC layers: 2048 β 256 β 10 (output logits)
- Inference: CPU-compatible, single forward pass, no TTA
| Metric | Value |
|---|---|
| Test Accuracy | ~75.1% |
| Architecture | Custom CNN (3 Conv + 2 FC) |
| Parameters | ~2.1M |
| Input Size | 32 Γ 32 Γ 3 |
| Output | 10-class softmax |
| Inference Mode | CPU (no GPU required) |
- π§ Progressive filter doubling (32 β 64 β 128) consistently improves feature extraction on CIFAR-10 without overfitting at this scale
- π Resolution bottleneck is the primary accuracy ceiling β CIFAR-10's 32Γ32 images lose fine-grained detail, making classes like
catvsdoggenuinely hard even for CNNs β οΈ Softmax overconfidence is real β the model outputs high confidence even on out-of-distribution images; temperature scaling would help- π A ResNet-18 backbone on the same dataset would push accuracy to ~90β93%, confirming the custom CNN is strong for its parameter count
- πΈ
frog,ship, andairplaneare typically the easiest classes due to distinct color distributions;catanddogare the hardest
cifar10-classifier/
β
βββ app.py # FastAPI backend β model loading + /predict endpoint
βββ index.html # Custom frontend UI (drag & drop + results display)
βββ cnn_cifar10.pth # Trained model weights
βββ requirements.txt # Python dependencies
βββ Dockerfile # Docker container config for HF Spaces
βββ limitations.txt # Known model limitations & future improvements
βββ README.md # This file
# Clone the repo
git clone https://github.com/ronakrajput8882/CNN-Image-Classifier.git
cd CNN-Image-Classifier
# Install dependencies
pip install -r requirements.txt
# Start the server
python app.py
# β Open http://localhost:7860docker build -t cifar10-classifier .
docker run -p 7860:7860 cifar10-classifierπ https://ronakrajput8882-cifar10-classifier.hf.space/
- Serving a PyTorch model with FastAPI is more flexible and production-ready than Gradio/Streamlit for custom UIs
- Docker on HuggingFace Spaces gives full control over the runtime environment β no SDK lock-in
- CIFAR-10's 32Γ32 resolution is a hard accuracy ceiling for custom CNNs; modern architectures use data augmentation (RandomCrop, HorizontalFlip, Cutout) to push past 90%
- Softmax probabilities are not calibrated β a 95% confidence score β 95% correct; always mention this to end users
- Building the frontend from scratch (vs Gradio) teaches you exactly what the model API contract looks like in production
| Tool | Use |
|---|---|
| PyTorch | Model definition, training, inference |
| torchvision | CIFAR-10 dataset, image transforms |
| FastAPI | REST API backend (/predict endpoint) |
| uvicorn | ASGI server |
| Pillow | Image loading and RGB conversion |
| Docker | Containerization for HF Spaces deployment |
| HTML/CSS/JS | Custom frontend UI |