The low accuracies in the experimental notebook from #3 are worrying:
| Page Detector |
random |
siamese |
imagehash |
vgg16 |
annotated |
| Accuracy |
0.03% |
3.95% |
6.58% |
61.84% |
100.00% |
The vgg16 page detector currently uses features produced by the last hiddent layer of a VGG16 model trained on ImageNet. Finetuning may not be an option, since our dataset is ill-suited for classification (too many document pages/classes, too few examples of each class). However, since our dataset is significantly different from ImageNet, we may have better luck using an earlier hidden layer of VGG16:
The siamese page detector uses position-dependent samples to normalize the input screen images, which may explain why the performance degrades on new document page and screen images (86% accuracy on training set versus 3.95% accuracy on test set):
If we manage to improve the siamese page detector, we may benefit from ensembling siamese with vgg16:
The low accuracies in the experimental notebook from #3 are worrying:
siameseimagehashvgg16annotatedThe
vgg16page detector currently uses features produced by the last hiddent layer of a VGG16 model trained on ImageNet. Finetuning may not be an option, since our dataset is ill-suited for classification (too many document pages/classes, too few examples of each class). However, since our dataset is significantly different from ImageNet, we may have better luck using an earlier hidden layer of VGG16:The
siamesepage detector uses position-dependent samples to normalize the input screen images, which may explain why the performance degrades on new document page and screen images (86% accuracy on training set versus 3.95% accuracy on test set):If we manage to improve the
siamesepage detector, we may benefit from ensemblingsiamesewithvgg16:ensemblepage detector tovideo699.__main__.