TimestampAudio CLI by Waha

TimeStampAudio generates timing data from any audio and corresponding text file combination, in the over 1,100 languages supported by Meta's MMS ASR model, outputting the results in JSON.

Requirements:

We have successfully run this tool on both MacOS and Linux laptops with only CPUs by simply running the command below.

However, the MMS model works most efficiently with a CUDA enabled GPU.

While an Mac M1 Pro could timestamp a 10 minute audio file in about 2.5 minutes, that same file could be timestamped in just a few seconds when processed with a GPU.

Installation:

Installation is fairly straight-forward. Simply:

Clone the repository.
Install the required Python libraries:

pip install -r requirement

Install ffmpeg and sox using your preferred method, e.g. brew install ffmpeg sox.
You're ready to go!

Usage:

To start timestamping, the first step is to organize your files.

The script will process an entire directory, expecting to find pairs of files with matching names, in .mp3 and .txt format. So GEN.1.mp3 and GEN.1.txt

When run on a directory, every pair of files with matching names will be processed.

$ tree .
.
├── GEN.1.mp3
├── GEN.1.txt
├── GEN.2.mp3
├── GEN.2.txt
├── GEN.3.mp3
├── GEN.3.txt
├── GEN.4.mp3
├── GEN.4.txt
├── GEN.5.mp3
└── GEN.5.txt

You can run the CLI tool with the following arguments:

python main.py -i <input_folder> -o <output_folder>

The script will output a JSON and a SRT file in the output directory for each audio-text file-pair.

Arguments

-i, --input (required): The path to a folder containing audio and text files.
-o, --output (required): The path to a folder to write JSON and SRT files to.
-s, --separator (optional): The location to timestamp within a text file. Options are lineBreak, leftBracket ([), or downArrow (⬇️). Default is lineBreak.
-l, --language (optional): The language of the text and audio files. If not provided, the app will automatically detect the language using MMS's lid API.
-m, --max-silence-padding-ms (optional): The maximum amount of silence padding (in ms) to offset the start and end timestamps of each text span. Default is -1 (equally distribute silence). 0 will remove all silence. 500 (for example) will add up to 500ms of silence to the start and end of each text span.

Example

python main.py -i ./input-dir -o ./output-dir -s lineBreak -l eng -m 0

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data		data
mms		mms
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
align_bible.py		align_bible.py
bibles.py		bibles.py
constants.py		constants.py
lid.py		lid.py
logo.png		logo.png
logo.svg		logo.svg
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
timestamp_types.py		timestamp_types.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TimestampAudio CLI by Waha

Requirements:

Installation:

Usage:

Arguments

Example

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TimestampAudio CLI by Waha

Requirements:

Installation:

Usage:

Arguments

Example

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages