This university project focuses on "Object Recognition" using the Pascal VOC 2012 dataset, which is rich in images representing various objects in different contexts. The goal is to develop a classification system to determine the presence of objects in the test images by assigning them to one of twenty classes in the dataset. A Convolutional Neural Network (CNN)-based approach is adopted for visual feature extraction. Transfer learning is employed to exploit pre-trained networks on larger datasets. The pyramid sliding window algorithm with the InceptionV3 model is used to handle the classification of multiple objects in images by examining different portions and overlaps.
In the code, initialize the varaible main_path with the path to your workspace.
Then, once you have downloaded the dataset and extracted the folders initialize the varaible annot_path with the path for "Annotations" folder and the varaible images_path with the path for "JPEGImages" folder
Run the cells for the pre-processing part that will allow you to create a csv file called annotFiles.csv with the coordinates of the objects associated with the images and a csv called filesXML.csv with all the xml file names.
Run the cells for creating the background class images. In this step, a folder called background_img will be created with the background-classified images that will be used for training. A csv file called annotBackground.csv will also be created with all the file names created and classified with background.
Run cells for cutting instances in the images. This creates a folder called CuttedBoundingBoxImages containing all the images with only the individual instances annotated to the images in the dataset. A csv file called BoundingBoxes.csv is also created with all the files created.
Run the cells to separate the dataset and create the csv for the train set and test set. At this stage two files will be created: dataset_testing.csv and boundingBoxes_without_testing.csv. After that from the file boundingBoxes_without_testing.csv class balancing is performed and an additional csv called dataset_training.csv containing the file names of the balanced images is created. The latter will then be used for the training phase.
Run cells to create a folder called AugmentedImages containing the augmented images. A csv called augmImg.csv is also created with the file names of the newly created images.
Run the loading data cells to load and run the training with the unaugmented dataset, instead run the loading data with data augmentation section to load and run the training with the augmented dataset.
Next, run the loading network section to perform the training that will save the towed model in a .h5 file.
Run the cells in the sub-sections to perform the various tests.
- The first subsection is devoted to accuracy on the test data which will output a confusion matrix.
- The second part is devoted to metrics such as precision, recall and f1.
- The third part performs the true prediction on the whole image using the Sliding Window. It is necessary to go and specify the path of the image on which you want to perform the prediction.
This project is licensed under the MIT License - see the LICENSE file for details.