Skip to content

rwth-i6/LoquaciousAdditionalResources

Repository files navigation

Additional Loquacious Resources

This repository contains scripts to create pronunciation lexica and count-based language model in ARPA-format for the Loquacious dataset.

Download Files

The files created by the scripts are hosted within the official Loquacious HuggingFace repository. You can download the final LM, vocab and lexicon files from there. Intermedate files are not available.

Usage

This repository uses Apptainer for a containerized environment to run the exact build process on any machine. Apptainer is fully compatible to Singularity. If Singularity is used each apptainer command can be edited to singularity.

You can call 00_create_apptainer_and_kenlm.sh to create the apptainer image and compile KenLM with it. Afterwards, each script should be called via apptainer, e.g.:

apptainer run --bind <current_filesystem_root> 01_prepare_cmudict.sh

The bind parameter is necessary in case you are not operating within your user folder. For more information on binds look here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published