Skip to content

RohanBabbar/ASR_Research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎙️ Hinglish Speech Processing Pipeline

Comprehensive Speech & Audio Processing, Disfluency Detection, Spell Checking, and Model Evaluation

📋 Table of Contents

📖 Overview

This repository contains a full suite of solutions, methodologies, and architectures specifically tailored towards conversational Hindi (Hinglish). The pipeline spans end-to-end model fine-tuning, automated NLP data cleaning, and dynamic algorithmic scoring.


🚀 Module 1: Whisper Fine-Tuning

Fine-tuning of openai/whisper-small using the Hindi subset of the FLEURS dataset (hi_in).

⚙️ Optimization & Setup

  • Bypassed default HuggingFace audio casting on Apple Silicon by decoding audio chunks manually with soundfile.
  • Alleviated mps backend Out-Of-Memory issues locally by setting PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0.
  • Utilized a batch size of 1 with 16 gradient accumulation steps to ensure stable training.

📊 Results

Model WER Improvement
Whisper-small (Baseline) 84.16% -
Whisper-small (Fine-tuned) 47.50% **~ 43%**

✂️ Module 2: Disfluency Detection

Identified and isolated speech disfluencies (fillers and stutters) across a 10-hour conversational dataset.

🔍 Methodology

  1. Detection: Leveraged NLP techniques over Google Cloud STT .json transcriptions. Extracted common Hindi fillers ("हम्म", "आह", "उम", "अह", "ओह") alongside custom backtracking logic to catch non-lexical word repetition behaviors typical of stuttering. (Total: 907 segments detected).
  2. Precision Clipping: Loaded .wav recordings directly into memory with soundfile to prevent slow I/O operations. Translated start and end timestamps strictly into sample indices for exact clipping.
  3. Audio Fidelity: Preserved original audio quality with zero destructive dynamic-range normalization.

📝 Module 3: NLP Spell Checking

Evaluated ~177,000 uniquely transcribed Hinglish words to identify correctly and incorrectly spelled terms, accurately handling English loanwords written in Devanagari script (e.g., "कंप्यूटर").

🧠 Validation Pipeline

Designed a robust two-tiered NLP validation engine:

  1. Academic Morphological Validation: Used spylls dynamically mapped against the official LibreOffice hi_IN dictionary (hi_IN.dic / hi_IN.aff). Allowed strict evaluation of native Hindi suffix/prefix affixation rules.
  2. Frequency Corpus Check: Cross-referenced failures from Step 1 against a massive conversational Hindi Subtitle Frequency Corpus. If a word was widely active organically in real-world contexts, it was structurally accepted as a correctly spelled conversational loanword.
  3. Data Cleaning: Hard-filtered English Latin alphabetic characters (A-Z), empty strings, numbers, and boundary punctuations.

🧮 Module 4: Consensus Architecture & Evaluation

Constructed a ROVER-style transcription consensus logic across 6 distinct ASR models to prevent unfair model penalization caused by human transcription typos.

🛠️ Implementation Details

  • Alignment: Stripped punctuation to unify standard alignments, executing evaluations purely at the word level.
  • Lattice Construction: Aligned the 6 candidate models (Model H, i, k, l, m, n) against the human reference baseline using Levenshtein distance (difflib).
  • Majority Voting: Evaluated every word slot in the lattice alignment. Replaced the baseline human reference word only if at least 3 out of 6 models mathematically agreed on a different output.

📉 Updated WER Results

Model Original WER Consensus WER Status
Model H 2.81 % 3.18 % Matched human typos closely
Model i 0.37 % 1.47 % Matched human typos closely
Model k 8.07 % 7.33 % Improved
Model l 8.68 % 8.07 % Improved
Model m 15.77 % 14.67 % Improved
Model n 9.90 % 9.05 % Improved

Note: Models with high initial error rates correctly saw score improvements since they were no longer unfairly penalized for disagreeing with human typos.

About

End-to-end ASR and NLP solutions for conversational Hindi speech processing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages