Multilingual Speech Recognition and Machine Translation

This project began as a part of finals of Defense Innovation Challenge, Shaastra 2020, IIT Madras.

Click here to see the Problem statement

The competition is over and I have taken it as a personal project now. All the Neural Network architectures in this project are written from scratch using Numpy and PyTorch.

Current Status

Approach :

The given problem statement of translating the incoming input signal and giving output can be broken down into two subtasks:

Speech Recognition – This system takes in the input audio and outputs it in text format in the same language in which it was spoken.
Machine Translation – This system takes the output text of speech recognition model and translates into the required output language.

Benefits of dividing into subtasks

I am using data driven Deep Learning methods to realize these systems. And I can find datasets which are speech to text and text to text online.

ASR system

Deep Learning based Speech Recognition model. I have implemented an Attention based Encoder-Decoder network for the ASR system.
It is based on the paper Listen, Attend and Spell.
The audio input is converted to a MelSpectrogram which is fed to the model.
It is a character-based model and thus decoder outputs character sequences.

Spectrogram extracted from input audio and corresponding attention map

Machine Translation system

The model is based on Encoder-Decoder architecture with Attention.
It gets the input from the ASR and it translates it to the selected output language.
Before feeding to the model, the sentences are pre-processed, tokenized and normalized.
The attention mechanism, here helps to translate longer sentences.

Some Trianing Logs

Progress

Implement CTC model
Implement attention based Encoder-Decoder model
Add Language model
Add SpecAugment on input Spectrograms
Add Beam Search
Implement joint Attention-CTC model
Implement words, sub-words model

Acknowledgements

Thanks, Yash Patel for guiding me during this project.
Alexander's End-to-end-ASR-Pytorch has been a great help for me while developing this project.

Reference

Listen, Attend and Spell, W Chan et al.
SpecAugemnt: A Simple Data Augmentation Method for Speech Recognition, Park et al.
Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks, A Graves et al.
Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning, S Kim et al.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
Code		Code
papers		papers
.gitignore		.gitignore
Readme.md		Readme.md
attention_mapping.png		attention_mapping.png
tensorboard_1.png		tensorboard_1.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilingual Speech Recognition and Machine Translation

Current Status

Approach :

Benefits of dividing into subtasks

ASR system

Machine Translation system

Progress

Acknowledgements

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multilingual Speech Recognition and Machine Translation

Current Status

Approach :

Benefits of dividing into subtasks

ASR system

Machine Translation system

Progress

Acknowledgements

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages