NeMoASR

Automatic speech recognition with speaker diarisation.

Based on:

NVIDIA NeMo Parakeet TDT 0.6b V3: Multilingual Speech-to-Text Model for automatic speech recognition
NVIDIA NeMo Sortformer Diarizer 4spk v1 for speaker diarisation

Requirements

Setup

Linux:

sudo apt install ffmpeg

conda create -n nemoasr python=3.12
conda activate nemoasr

pip install git+https://github.com/HanBnrd/NeMoASR.git

MacOS:

brew install ffmpeg

conda create -n nemoasr python=3.12
conda activate nemoasr

pip install git+https://github.com/HanBnrd/NeMoASR.git

Update NeMoASR

pip install --upgrade git+https://github.com/HanBnrd/NeMoASR.git

Usage

To transcribe a WAV or MPEG file:

nemoasr myfile.mp3

Note: running this for the first time may be long as the models need to be downloaded.

The default configuration cuts long audio files into 7-minute chunks, which should work well on machines with limited RAM or VRAM. However, the chunk duration can be adjusted if needed. For example with more RAM or VRAM:

nemoasr myfile.mp3 --max-duration=12

This will cut a long audio file into chunks of 12 minutes maximum.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
nemoasr		nemoasr
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeMoASR

Requirements

Setup

Update NeMoASR

Usage

About

Uh oh!

Releases

Packages

Languages

License

HanBnrd/NeMoASR

Folders and files

Latest commit

History

Repository files navigation

NeMoASR

Requirements

Setup

Update NeMoASR

Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages