Skip to content

Add asynchronous audio processing  #2

@ManziBryan

Description

@ManziBryan

It would be really cool to process the audio in real time using something like this https://github.com/NVIDIA/NeMo/blob/stable/examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py

At the moment, even though the /transcribe endpoint is marked as async, it is completely blocked until the self.transcribe() call completes. It also only accepts an entire file, as opposed to a stream of audio bytes. This means we can only process an entire file at once, and probably means we need more resources to read the entire file into memory and transfer it to the model. If we could stream, say 10s at a time, we would probably be able to transcribe longer files, and if we did something like the cache aware ^^ thing up there, we might be able to do this without a significant degradation in performance?

Does this require changing the model to accept a collection of bytes instead of an entire file?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions