This project is a part of my personal training in Computer Vision. Here, we can find two folders, Computer Vision Module (compvis) and Practical Examples. The compvis module is a pipeline that builds any model practically. It covers several CNN architectures, and also, some skills for image preprocessing.
The compvis module is inspired by the PyImageSearch module, so the credits for this module is to the author of the website Adrian Rosebrock. How did I have access to the module? I've grabbed this repository on GitHub 1 and I found this website 2, all of them contain the module. Why did I change the name of module? During my training, typing line by line to understand the module, I've decided to change the name of the module, and also, some modifications in the module (name changing and including class), for a questions of simplicity and coding preference.
The folder Practical Example contains examples how to use the module. As I'm not enrolled at PyImageSearch, I should follow the examples from 1. The problem was, in this repository there's no explanation how to use the module, neither discussion about the results. I learned how to use it by myself and reading the PyImageSearch’s blog. To understand each model, I did a deep reading about the papers of each model. Each example presented here is followed by a model explanation, discussion about the results and evidently, a detailed explanation of how to use the module.
-
compvis module
-
callbacks sub-module
- EpochCheckPoint class
- TrainingMonitor class
-
datasets sub-module
- SimpleDatasetLoader class
-
io sub-module
- HDF5DatasetGenerator class
- HDF5DatasetWriter class
-
nn sub-module
- ANN class
- cnns sub-sub-module
- AlexNet class
- DeeperGoogLeNet class
- FCHeadNet class
- LeNet class
- MiniGoogLeNet class
- MiniVGG class
- ShallowNet class
- lr sub-sub-module
- LRFunc class
-
preprocessing sub-module
- CropPreprocessor class
- ImageToArrayPreprocessor class
- MeanPreprocessor class
- PatchPreprocessor class
- ResizeAR class
- SimplePreprocessor class
-
utils sub-module
- rank5_accuracy attribute
-
The practical examples follow a logical order. Using the compvis module, We start from a simple image classifier into the most advanced. Each folder presented in the Practical Examples corresponds to a specific model or techniques (image preprocessing, learning rate scheduler, regularization etc). Several datasets are considered along the examples, all of them are cited, due to a space limitation, I did not up load the datasets, but you find the link to download all of them in the proper moment.
Image classification with Logistic Regression and K-Nearest Neighbors on the Animals dataset. Logistic Regression accuracy 59% and K-Nearest Neighbors accuracy 60%
Image classification using Artificial Neural Network. Example made with the class ANN from the computer vision module, but also with TensorFlow library. The dataset considered are Animals, MNIST (8x8) and CIFAR10. The accuracy for the Animals dataset is 59%, for MNIST 97% and for CIFAR10 57%.
Image classification using Convolution Neural Network, specifically ShallowNet architecture from the compvis module. The ShalloNet model is composed by one convolutional layer and the full conected layers. The dataset used are Animals (accuracy of 70%) and CIFAR10 (accuracy of 62%).
4 - LeNet on MNIST Image classification using the LeNet architecture on the MNIST (28x28) dataset. The obtained accuracy was 98% without over-fit.
Image classifications using the Mini VGG architecture on the CIFAR10 dataset. In this example some regularization techniques was implemented. The accuracy for this model was 80% on the test set.
Image classification using MiniVGG architecture on the CIFAR10. In this example, we consider the Learning Rate Scheduler from TensorFlow, passing a piecewise function to change the learn rate every 5 epochs. The best result have 79% of accuracy. This result is a little bit smaller than the previous example, on other hand, the overfit was reduced.
7 - DataAugumentation and aspect ratio
Image classification using Mini VGG network on Animals dataset. In this example, was considered Data Augmentation regularization, to the image preprocessing, we've resized all the images maintaining the aspect ratio. The accuracy was 74% without overfit.
8 - Feature extraction and HDF5 file
Image classification using transfer learning. We consider the Feature Extraction using the convolutional (the body of the network) layers from the VGG16 pre-trained model. To store all obtained features, we utilize a dataset writer outputting a HDF5 file. To realize the classification, we consider the Logistic Regression Model from Scikit-Learn. The accuracy obtained for the Animals dataset on the test set was 99%, our best result until now. For the 17 Flowers the accuracy was 92%.
9 - Transfer Learning with Fine Tuning
Image classification using transfer learning. We consider the fine-tuning method using a VGG16 pre-trained model. The training process is made in two step. The first step is the feature extraction and classification made by a defined head function, in this moment the weights from VGG16 are not tweaked. The second step the head function communicates with some layer from the body, realizing the fine-tuning over the weights from the pre-trained model. We've considered the 17 Flowers dataset, the accuracy was 95%, a considerable increase. On the other hand, the model seems in over-fit.
10 - Dos vs Cats dataset with AlexNet and transfer learning
Using HDF5 datawriter and generator to manege the dataset.Implementation of AlexNet architecture, training the classification model on Dog vs Cats dataset from Kaggle. Rank-1: 94.08% with crop method on the test set. Using feature extraction with ResNet50, we obtain 0.9856 as accuracy score. The classification model was trained with 2500 images, just 10% of the original training set.
- data_building.ipynb
- training_alexnet.py
- testing_alexnet.ipynb
- features_extraction.ipynb
- training_on_features.ipynb
11 - GoogLeNet
Training the Mini GoogLeNet on the CIFAR10 and, Deeper GoogLeNet on the Tiny ImageNet dataset. The results with the Mini GoogLeNet on CIFAR10 show an accuracy of 91% on training set, the best result until now, using this module. The results on Tiny ImageNet was also good, we've obtained an error rate of 0.55, a good result for the Tiny ImageNet challenge, not the best, but convicently. We also used the HDF5DatasetGenerator to read the images on the batch instead of allocate all images on the memory.
12 - ResNet
Training ResNet architecture on CIFAR10 and Tiny ImageNet dataset. The accuracy on the validation set of CIFAR10 was 92.8%, the best result in this training folder. On the Tiny ImageNet the error rate using ResNet was 51%, showing that the model reached a good generalization, but the model is not well efficient to real world problems.
The examples considered here are trained with classical datasets as MNIST, CIFAR-10, CALTECH, Tiny ImageNet, COCO, Kaggle challenges datasets and others.
If you want use the compvis module, Add the following line to your ~/.profile file.
export PYTHONPATH=$PYTHONPATH:/path/you/want/to/add
Along the project, I cite the papers that refer models or techniques, in the proper moment. I also would like to highlight the importance of the Massive Open Online Courses (MOOCs), that offers to us many possibilities to learn practically (sometimes free or not, but always in the web). I’ve used many of them in this project as
-
PyImageSearch offers to us free examples for real applications in computer vision, the best website about Computer Vision.
-
Learn OpenCV website specialized about OpenCV.
-
Machine Learning Mastery is an excellent website that proposes several practical examples for Machine Learning projects in your blog.
-
freeCodeCamp.org YouTube channel with code examples about technical courses.
-
Deep Learning and Computer Vision A-Z™: OpenCV, SSD & GANs on the Udemy platform.
Keep in mind, the best sources are always the original papers of each Convolutional Neural Network model. From them, we can understand better the idea behind each model.