Skip to content

Latest commit

 

History

History
81 lines (53 loc) · 8.02 KB

File metadata and controls

81 lines (53 loc) · 8.02 KB

Project Information

  • Credits - 6 - about 120+ hours of work per groupmate
  • Timeline - 3.5 months
  • Report - Scientific Format - 8 pages with introduction, SOTA, experimental setup, experimental resutls etc. Use icml template
  • Defence - Present what was done and online presentation

Introduction

  • Sequence Mining - it's an MLDM project hence we'll implement machine learning and data mining approaches. For mining approaches, sequence mining algorithms must be used. The algo must track the main patterns which are representative of each class of digits. This is done in an unsupervised way as we don't want to learn a classifier, rather only extract the most important patterns.
  • Metric Learning - According to the representation we learn from sequence mining, a metric learning algo should be implemented, eg: apply LMNN - implement at least one algo (implementations exist in matlab and python). During the defence, methodological questions will also be posed.
  • Deep Learning - Used only to generate features/ learn latent space. DO NOT learn classifier. The new features can be used with KNN algo.

Dataset Creation

  • Dataset ought to be created using one's own environment to create digits.
  • You can also merge created dataset with SOTA dataset like MNIST.
  • You can also generate your own datasets using GANZ - adversarial methods.

Dataset Representation

  • Structured Method - in the form of sequences. We will have to implement an algo based on Freeman's Code (FC).

    • The Idea - you have a matrix of pixels, we begin at the top left of the matrix. Once you meet the 1st pixel, you seek the second one. Once done, look for the corresponding directional primitive of the freeman codes which allows you to go from the first to the second pixel. So as you advance through the pixels you keep encoding the freeman's codes. Once this is done on multiple samples we'll have FC for instances in our dataset. Once this is done, we'll have to use the Nearest Neighbours algo using the edit distance. If we use DNN/CNN/GANZ to create new examples you'll get a numerical representation.
      • Attention 1 - be careful if the digit has two different connected components, or for example if the user falters trying to draw something we must be able to recognize the faulty pixel as noise and ignore it capturing only the principle digit being expressed (a filtering approach is suggested to recognise the single principle component ignoring the noise).
      • Attention 2 - a digit can be represented in multiple sizes (big, small, medium). And considering a digit say '1' written in a 1024x1024 format, its edit distance with an image created of the digit '1' in a 460x460 format must be the same (Scaling approaches suggested for this, or even normalizing it).
      • Attention 3 - if the dataset is not representative of digits written in different formats your accuracy is going to suffer, so try reproducing/creating images of different sizes/formats (use GANZ, handcrafted data...etc).
      • Attention 4 - implementing the standard edit distance won't work well. Since edit distance assigns a cost of only '1' the editing, deleting and adding operation, this is bound not to work too well, since performing operations of edit, delete or add within the FC would cost the same for numbers '0' and '4' with are orthogonal. Hence we would need to penalise more the edit distance cost of two very different directional primitives of the FC. So in the worst case scenario, one would have 8 symbols + empty letter (the empty symbol is required to insert/delete). This would make it 9x9 symbols, i.e 81 possible edit distance cost (there are plenty of scientific papers trying different techniques).
  • Numerical Method - using Deep Learning.

    • The Idea - If one uses DNN/CNN/GANZ to create new examples a numerical representation of the image is obtainable. On to this a standard KNN with ecludian distance can be applied. It would be ideal to test this numerical represenation with standard ecludain distance and improve its quality by applying a metric learning algo like LMNN on the numerical representation.
      • Comment 1 - Since numerical features are learnt using DNN techniques, non-linearity will be captured and with these features, we'll be able to apply LMNN - a simple linear method. Since the non-linearity would have already been captured by Deep Learning, the algo as a whole should work well.
      • Comment 2 -There are two different methods to learn representations; deep learning and metric learning. Here, we merge and do both and check if by actually merging we get better results or improve baseline (10% accuracy - the probability of randomly guessing one of the digits).

Sequence Mining

  • The Idea - The objective would be to apply the sequence mining algorithm on each category of digits. This should in turn help extract the most interesting patterns characterizing each class of digits.
    • Attention 1 - imagine we have 2 sequence patterns samples - 1222224.. and 1216224..for a digit say '3'. But for some reason, there is a small noise in one of the three's resulting in a slight change is sequence values. So one must think of a way of filtering this noise so a to predict the right digit - either remove 16 and replace with 2's or the can just be removed.
    • Comment 1 - A strategy to deal with Attention 1 would be, one can consider two patterns to be the same, if the edit distance between patterns one and two is bounded by some threshold. This way we can allow for some edit errors or noise.
    • Comment 2 - One could use a CNN to learn features to benifit from the the filters that learn to extract patterns. Since the convolutional kernels in a CNN can be seen as a sub-patterns of the digit (filters extract low-high level features of images). Deep Learning could be used to mine for digits . This method can't be used as a replacement to Comment 1 as implementing data data mining approaches is imperative, deep learning is just the cherry on the cake.

Going Further

  • Implement Transfer learning methods. For example, we are two in a group, the only one draws digits for training and the second draw digits for testing. And there's possibly going to be a shift in the distribution between the two and one will need to automatically adapt one's model to this shift.
  • KNN is not perfect, we have storage and complexity problems. So we'll have to implement different strategies to overcome these disadvantages (CNN, radial speed up method,.. etc). There exits plenty of algorithms to help over the disadvantages of KNN, so one should evaluate different strategies and compare its performance - in terms of accuracy and time/space complexity - with the baseline (standard KNN) .
  • Create something from the designed classifier (possibly the optimal one), like a game of sorts - a calculator for example.

General Comments

  • Ideally, both the methods - Structured and Numerical - must lead to very good accuracy rates.
  • The objective is to get the best results and it's a competition between groups. Bonus to group with the best accuracy.
  • Platform could have an option for downloading own dataset for comparison.
  • Don't have to reimplement LMNN from scratch.
  • Report must be done in LATEX.

Reference and Msc Material

Questions and Comments (Please don't change the above text, if you have any questions, comments or suggestions, please put them below)

  • Make the adjustable pointer for writing the digit to increase or decrease the thickness of the digit written.