Prosody Modeling

This repository contains the source files of an automatic prosody annotation tool, that has been designed by the team at Speech Lab, Shiv Nadar University Chennai as a part of the Prosody Modeling module of the Bashini: NLTM Speech Technologies in Indian Languages Project, funded by the Ministry of Electronics and Information Technology, Govt. of India.. This tool has been designed to provide rich prosodic annotations for Indian languages. The annotator will take a speech signal and the corresponding orthographic transcription as inputs and provide the phoneme, syllable, and word boundaries, along with the pitch contour labels and the intensity index at the syllable level, and the break indices. While the phoneme boundaries are estimated using hidden Markov models (HMMs) trained on 5 hours of data (from a male and a female speaker) in each language, the rest are derived based on rules formulated after extensive analyses. The tool is currently designed for English, Tamil, and Hindi, but can be extended to other languages by including the appropriate letter-to-sound rules and training phoneme HMMs.

The data used to train the models can be accessed here

The performance of the various modules of the tool is evaluated by comparing annotations derived from the tool for 50 audio files (per language) with the corresponding manual annotations. For most phoneme segments, the segmentation error is under 10 ms. The overall accuracy for break indices across three languages-Tamil, Hindi, and Indian English is 95%, while the pitch contour model achieves an accuracy of 99% relative to manual annotations.

Tool Demo Screenshot

Installation

To use this project, you will need to clone the repository and install the required dependencies as follows:

Clone the Repository

git clone https://github.com/speech-lab-snuchennai/Prosody_Modelling
cd project-name

Installing Dependencies

pip install -r requirements.txt

Usage

Update the "filename" (with the path to the audio file to be annotated) and "language" fields in main.py. Provide the corresponding input text in the te.txt file. Then run the man.py file. This should display an image as shown above and generate label files in the directory containing your audio file.

python main.py

Training a New Model for Segmentation

The HMM toolkit HTK has been employed to train the phoneme HMMs. In order to train models for new English, Tamil, or Hindi data, download and install HTK as described here.

After installing HTK, create a wav folder (containing the audio files in wav format) and a lab folder (containing a set of initial lab files, which could be generated using main.py). Then run the run.sh file to train the phoneme models.

./run.sh

Demo

🌐 Project Demo – SpeechLab, SNU Chennai

License

If you use these Prosody Modeling features in your research or work, please consider citing:

📄 Read our paper on arXiv

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Lab_Files		Lab_Files
Perl		Perl
config_files		config_files
dep_list		dep_list
dep_models		dep_models
lists		lists
scripts		scripts
unified_python_parser/Unified_Parser_smt_lab_IITM		unified_python_parser/Unified_Parser_smt_lab_IITM
web		web
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.md		requirements.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prosody Modeling

Tool Demo Screenshot

Installation

Clone the Repository

Installing Dependencies

Usage

Training a New Model for Segmentation

Demo

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Prosody Modeling

Tool Demo Screenshot

Installation

Clone the Repository

Installing Dependencies

Usage

Training a New Model for Segmentation

Demo

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages