This project classifies food & beverage images using a Convolutional Neural Network (CNN). The dataset consists of 9323 training images and 484 test images across 61 classes (e.g., water, pizza-margherita-baked, broccoli, salad, egg, etc.).
- Train a CNN model to classify food items from images.
- Improve generalization using data augmentation.
- Monitor training progress with validation curves.
- Provide model interpretability using Grad-CAM visualizations.
data/
βββ training_set_128/ # 9323 images (train + validation)
βββ test_set_128/ # 484 images (unlabeled test data)
loss_accuracy_curves/ # images for some of the models tested
βββ accuracy/
βββ loss/
saved_models/ # best model trained
images/ # for README
| Class | Number of Images |
|---|---|
| Water | 863 |
| Bread-White | 595 |
| Salad-Leaf | 535 |
| Pickle | 28 |
Below are some sample images from the training set (some of them are difficult to recognize even for human eyes - hard-cheese hard ineed!):
To improve generalization, applied:
- Rotation (Β±30Β°)
- Zooming (20%)
- Shifting (20%)
- Horizontal Flipping
- Rescaling (0-255 β 0-1)
The CNN consists of:
- 4 Convolutional Blocks with ReLU activation
- Softmax Activation for multi-class classification
model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Conv2D(128, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Conv2D(256, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Flatten(),
Dense(128, activation='relu'),
Dense(61, activation='softmax')
])- Optimizer: Adam (
learning_rate=0.001) - Loss Function: Categorical Crossentropy
- Metrics: Accuracy, Precision, Recall
Loss & Accuracy over epochs:
Visualizing misclassified images:
We use Grad-CAM to visualize important regions in an image that influenced predictions.
- Use Transfer Learning (e.g., MobileNet, ResNet) for better accuracy
- Optimize hyperparameters using KerasTuner



