A comparison of multiple machine learning approaches for MNIST digit classification, implemented in PyTorch and NumPy.
This project implements and compares five different classification algorithms on the MNIST dataset:
- K-Nearest Neighbors (KNN)
- Naive Bayes
- Linear Classifier
- Multi-Layer Perceptron (MLP)
- Convolutional Neural Network (CNN)
Install dependencies using:
pip install -r requirements.txtRequired packages:
- pandas
- matplotlib
- torch
- torchvision
- scikit-learn
- numpy
- Place your MNIST dataset in a zip file in the project directory
- Run the extraction script:
python parse.pyThis will extract the dataset to a folder named MNIST with the following structure:
MNIST/
├── 0/
├── 1/
├── ...
└── 9/
Each subdirectory should contain grayscale images (28×28 pixels) for the corresponding digit class.
Run the complete comparison pipeline:
python main.pyThe script will:
- Load and preprocess the MNIST dataset
- Split data into train/test sets (80/20 split with stratification)
- Train and evaluate all five models
- Display performance metrics (accuracy, precision, F1)
- Generate and save plots in
outputsfolder (batch grid, confusion matrices, histograms, comparison bar chart)
Modify config.py to adjust:
ROOT_DIR- Dataset directory pathTRAIN_BATCH_SIZE- Batch size for trainingTEST_SIZE- Test set proportion (default: 0.2)RANDOM_SEED- Random seed for reproducibilityN_CLASSES- Number of classes (default: 10)
├── main.py # Main execution script
├── config.py # Configuration and constants
├── data.py # Data loading and preprocessing
├── models.py # Neural network architectures
├── trainer.py # Training loops for Linear/MLP
├── cnn_train.py # CNN training implementation
├── knn.py # K-Nearest Neighbors implementation
├── naive_bayes.py # Naive Bayes implementation
├── metrics.py # Evaluation metrics and confusion matrix
├── figures.py # Visualization utilities
├── parse.py # Dataset extraction utility
└── requirements.txt # Python dependencies
- Evaluated with k ∈ {1, 3, 5}
- Uses Euclidean distance
- Batch processing for efficiency
- Bernoulli variant with binary features (threshold: 0.5)
- Laplace smoothing (α = 1.0)
- Single fully-connected layer (784 → 10)
- Mean Squared Error (MSE) loss
- SGD optimizer with learning rate 0.1
- Architecture: 784 → 256 → 128 → 10
- ReLU activations
- Cross-entropy loss
- Two convolutional blocks (1→32→64 channels)
- Max pooling after each block
- Fully-connected classifier (3136 → 128 → 10)
The program outputs:
- Training progress for each model (loss and accuracy per epoch)
- Test set metrics: accuracy, precision, F1-score
- Comparison table of all models
- Figures saved in
outputs/:batch_grid.png— sample images from a batchconfusion_*.png— confusion matrices for each modellinear_weights.png— visualized weights of the linear classifierhist_labels_train.png— training label distributionhist_preds_cnn.png— CNN predicted label distributionmetrics_comparison.png— bar chart comparing model performance
Open the images directly in VS Code's Explorer under the outputs/ folder.
The code automatically uses CUDA if available, otherwise falls back to CPU. Check device usage in the output:
Using device: cuda