Skip to content

sachin-101/Multilingual-ASR-and-MT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multilingual Speech Recognition and Machine Translation

This project began as a part of finals of Defense Innovation Challenge, Shaastra 2020, IIT Madras.

Click here to see the Problem statement

The competition is over and I have taken it as a personal project now. All the Neural Network architectures in this project are written from scratch using Numpy and PyTorch.

Current Status

Approach :

The given problem statement of translating the incoming input signal and giving output can be broken down into two subtasks:

  1. Speech Recognition – This system takes in the input audio and outputs it in text format in the same language in which it was spoken. 
  2. Machine Translation – This system takes the output text of speech recognition model and translates into the required output language.

Benefits of dividing into subtasks

  • I am using data driven Deep Learning methods to realize these systems. And I can find datasets which are speech to text and text to text online. 

ASR system

  • Deep Learning based Speech Recognition model. I have implemented an Attention based Encoder-Decoder network for the ASR system.
  • It is based on the paper Listen, Attend and Spell.
  • The audio input is converted to a MelSpectrogram which is fed to the model.
  • It is a character-based model and thus decoder outputs character sequences.

Spectrogram extracted from input audio and corresponding attention map

Machine Translation system

  • The model is based on Encoder-Decoder architecture with Attention.
  • It gets the input from the ASR and it translates it to the selected output language.
  • Before feeding to the model, the sentences are pre-processed, tokenized and normalized.
  • The attention mechanism, here helps to translate longer sentences.

Some Trianing Logs

Progress

  • Implement CTC model
  • Implement attention based Encoder-Decoder model
  • Add Language model
  • Add SpecAugment on input Spectrograms
  • Add Beam Search
  • Implement joint Attention-CTC model
  • Implement words, sub-words model

Acknowledgements

Reference

  1. Listen, Attend and Spell, W Chan et al.
  2. SpecAugemnt: A Simple Data Augmentation Method for Speech Recognition, Park et al.
  3. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks, A Graves et al.
  4. Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning, S Kim et al.

About

Defence Innovation Challenge, Shaastra 2020, IIT Madras

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors