🎙️ Hinglish Speech Processing Pipeline

Comprehensive Speech & Audio Processing, Disfluency Detection, Spell Checking, and Model Evaluation

📋 Table of Contents

Overview
Module 1: Whisper Fine-Tuning
Module 2: Disfluency Detection
Module 3: NLP Spell Checking
Module 4: Consensus Architecture & Evaluation

📖 Overview

This repository contains a full suite of solutions, methodologies, and architectures specifically tailored towards conversational Hindi (Hinglish). The pipeline spans end-to-end model fine-tuning, automated NLP data cleaning, and dynamic algorithmic scoring.

🚀 Module 1: Whisper Fine-Tuning

Fine-tuning of openai/whisper-small using the Hindi subset of the FLEURS dataset (hi_in).

⚙️ Optimization & Setup

Bypassed default HuggingFace audio casting on Apple Silicon by decoding audio chunks manually with soundfile.
Alleviated mps backend Out-Of-Memory issues locally by setting PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0.
Utilized a batch size of 1 with 16 gradient accumulation steps to ensure stable training.

📊 Results

Model	WER	Improvement
Whisper-small (Baseline)	84.16%	-
Whisper-small (Fine-tuned)	47.50%	~ 43%

✂️ Module 2: Disfluency Detection

Identified and isolated speech disfluencies (fillers and stutters) across a 10-hour conversational dataset.

🔍 Methodology

Detection: Leveraged NLP techniques over Google Cloud STT .json transcriptions. Extracted common Hindi fillers ("हम्म", "आह", "उम", "अह", "ओह") alongside custom backtracking logic to catch non-lexical word repetition behaviors typical of stuttering. (Total: 907 segments detected).
Precision Clipping: Loaded .wav recordings directly into memory with soundfile to prevent slow I/O operations. Translated start and end timestamps strictly into sample indices for exact clipping.
Audio Fidelity: Preserved original audio quality with zero destructive dynamic-range normalization.

📝 Module 3: NLP Spell Checking

Evaluated ~177,000 uniquely transcribed Hinglish words to identify correctly and incorrectly spelled terms, accurately handling English loanwords written in Devanagari script (e.g., "कंप्यूटर").

🧠 Validation Pipeline

Designed a robust two-tiered NLP validation engine:

Academic Morphological Validation: Used spylls dynamically mapped against the official LibreOffice hi_IN dictionary (hi_IN.dic / hi_IN.aff). Allowed strict evaluation of native Hindi suffix/prefix affixation rules.
Frequency Corpus Check: Cross-referenced failures from Step 1 against a massive conversational Hindi Subtitle Frequency Corpus. If a word was widely active organically in real-world contexts, it was structurally accepted as a correctly spelled conversational loanword.
Data Cleaning: Hard-filtered English Latin alphabetic characters (A-Z), empty strings, numbers, and boundary punctuations.

🧮 Module 4: Consensus Architecture & Evaluation

Constructed a ROVER-style transcription consensus logic across 6 distinct ASR models to prevent unfair model penalization caused by human transcription typos.

🛠️ Implementation Details

Alignment: Stripped punctuation to unify standard alignments, executing evaluations purely at the word level.
Lattice Construction: Aligned the 6 candidate models (Model H, i, k, l, m, n) against the human reference baseline using Levenshtein distance (difflib).
Majority Voting: Evaluated every word slot in the lattice alignment. Replaced the baseline human reference word only if at least 3 out of 6 models mathematically agreed on a different output.

📉 Updated WER Results

Model	Original WER	Consensus WER	Status
Model H	2.81 %	3.18 %	↑ Matched human typos closely
Model i	0.37 %	1.47 %	↑ Matched human typos closely
Model k	8.07 %	7.33 %	↓ Improved
Model l	8.68 %	8.07 %	↓ Improved
Model m	15.77 %	14.67 %	↓ Improved
Model n	9.90 %	9.05 %	↓ Improved

Note: Models with high initial error rates correctly saw score improvements since they were no longer unfairly penalized for disagreeing with human typos.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
asr_finetuning		asr_finetuning
disfluency_detection		disfluency_detection
spell_checker		spell_checker
wer_consensus		wer_consensus
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Hinglish Speech Processing Pipeline

📋 Table of Contents

📖 Overview

🚀 Module 1: Whisper Fine-Tuning

⚙️ Optimization & Setup

📊 Results

✂️ Module 2: Disfluency Detection

🔍 Methodology

📝 Module 3: NLP Spell Checking

🧠 Validation Pipeline

🧮 Module 4: Consensus Architecture & Evaluation

🛠️ Implementation Details

📉 Updated WER Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ Hinglish Speech Processing Pipeline

📋 Table of Contents

📖 Overview

🚀 Module 1: Whisper Fine-Tuning

⚙️ Optimization & Setup

📊 Results

✂️ Module 2: Disfluency Detection

🔍 Methodology

📝 Module 3: NLP Spell Checking

🧠 Validation Pipeline

🧮 Module 4: Consensus Architecture & Evaluation

🛠️ Implementation Details

📉 Updated WER Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages