SSL-SI-tool

Author: Yashish Maduwantha
Email: yashish@terpmail.umd.edu

New Update to the Repo

The two saved checkpoints in this repo have been removed. Please reach out to Yashish (yashish@terpmail.umd.edu) if you are planning to use this repo for Research purposes.

Description

The SSL-SI-tool implements the pipeline which can be directly used to estimate the articulatory features (6 TVs or 9 TVs + source features) given the speech utterance (.wav files).

This repository holds two Acoustic-to-Articulatory Speech Inversion (SI) systems trained on the Wisconsin XRMB dataset and the HPRC dataset respectively. The model architecture and training are based on the papers Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables, Audio Data Augmentation for Acoustic-to-articulatory Speech Inversion and "Acoustic-to-articulatory Speech Inversion with Multi-task Learning". The pretrained SI systems in this repository have been trained with self-supervised based features (HuBERT and wavLM) as acoustic inputs compared to the 13 MFCCs used in the papers above. Check the two papers above to refer to more information on the types of TVs estimated by each model.

Model trained on XRMB dataset : Estimates 6 TVs
Model trained on HPRC dataset : Trained with a MTL framework and estimates 9 TVs + Source features (Aperiodicity, Periodicity and Pitch)

Installation Guide

Follow steps in run_instructions.txt to get started quickly !!

The SI systems were trained in a conda environment with Python 3.8.13 and tensorflow==2.10.0. The HuBERT pretrained models used to extract acoustic features have been trained in PyTorch.

Installation method 1:

First install tensorflow and we recommend doing that in Conda following the steps here.

We also use a number of off the shelf libraries which are listed in requirements.txt. Follow the steps below to install them.

$ pip install speechbrain
$ pip install librosa
$ pip install transformers

Installation method 2 : Installing inidividual libraries from the requirements.txt file.

$ pip install -r requirements.txt

We recommed following method 1 since it will automatically take care of compatible libraries incase there have been new realase versions of respective libraries.

Note : If you run the SI system on GPUs to extract TVs (recommended for lareger datasets), make sure the cuDNN versions for pyTorch (installed by speechbrain) and the one installed with Tensorflow are compatible.

Run SI tool pipeline

Execute run_SSL_SI_pipeline.py script to run the SI pipeline which performs the following 'steps',

Run feature_extract.py script to do audio segmentation and extract specified SSL features using the speechbrain library
Load the pre-trained SSL-SI model and evaluate on the extracted SSL feature data generated in step 1
Save the predicted Tract Variables (TVs)

Tract Variables Output

The tract variables can be saved as either numpy files or mat files for convenience. The TVs and source features are saved in the following order in the output files.

6TVs with XRMB : LA, LP, TBCL, TBCD, TTCL, TTCD
12 TVs with HPRC : LA, LP, TBCL, TBCD, TTCL, TTCD, JA, TMCL, TMCD, Periodicity, Aperiodicity, Pitch (normalized to 0 to 1 range)

Python command line usage:

usage: run_SSL_SI_pipeline.py [-h] [-m MODEL] [-f FEATS] [-i PATH]
                              [-o OUT_FORMAT]

Run the SI pipeline

optional arguments:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        set which SI system to run, xrmb trained (xrmb) or
                        hprc trained (hprc)
  -f FEATS, --feats FEATS
                        set which SSL pretrained model to be used to extract
                        features, hubert to use HuBERT-large and wavlm to use
                        wavLM-large pretrained models
  -i PATH, --path PATH  path to directory with audio files
  -o OUT_FORMAT, --out_format OUT_FORMAT
                        output TV file format (mat or npy)

Example for running the ML pipeline

Run the pipeline from end to end (executes all 3 steps)

python run_SSL_SI_pipeline.py -m xrmb -f hubert -i test_audio/ -o 'mat'

Note

The SI systems trained with wavLM features will be added in the future. Only set -f parameter to 'hubert' at this point to run the models.

License

This project is licensed under the LICENSE-CC-BY-NC-ND-4.0 - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
model_train_XRMB		model_train_XRMB
test_audio		test_audio
.gitignore		.gitignore
KalmanSmoother.py		KalmanSmoother.py
LICENSE-CC-BY-NC-ND-4.0.md		LICENSE-CC-BY-NC-ND-4.0.md
README.md		README.md
feature_extracter.py		feature_extracter.py
requirements.txt		requirements.txt
run_SSL_SI_pipeline.py		run_SSL_SI_pipeline.py
run_instructions.txt		run_instructions.txt
run_saved_model.py		run_saved_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SSL-SI-tool

New Update to the Repo

Description

Installation Guide

Run SI tool pipeline

Tract Variables Output

Python command line usage:

Example for running the ML pipeline

Note

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Yashish92/SSL-SI-tool

Folders and files

Latest commit

History

Repository files navigation

SSL-SI-tool

New Update to the Repo

Description

Installation Guide

Run SI tool pipeline

Tract Variables Output

Python command line usage:

Example for running the ML pipeline

Note

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages