MathVisionNet is a machine learning model aimed at converting handwritten math expressions into LaTeX. It is a personal project of mine desgined to learn how these systems work. I plan on making different architectures to see how they compare and even trying different techniques for data augmentation to see how they might affect training.
At the end of training I want to figure out ways I can optimize the model so that it can run on limited hardware efficiently.
- CNN + LSTM Encoder-Decoder model (Currently working on)
- CNN + LSTM with Attention model
- CNN + Transformer Decoder model
- Vision Transformer + Transformer Decoder model
Starting out I am going to use MathWriting 2024 dataset to train these models but for all these models I need a much larger dataset so I will combine this datasets and see what I can do.