Add asynchronous audio processing 

It would be really cool to process the audio in real time using something like this https://github.com/NVIDIA/NeMo/blob/stable/examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py  

At the moment, even though the `/transcribe` endpoint is marked as async, it is completely blocked until the `self.transcribe()` call completes. It also only accepts an entire file, as opposed to a stream of audio bytes. This means we can only process an entire file at once, and probably means we need more resources to read the entire file into memory and transfer it to the model. If we could stream, say 10s at a time, we would probably be able to transcribe longer files, and if we did something like the cache aware ^^ thing up there, we might be able to do this without a significant degradation in performance?

Does this require changing the model to accept a collection of bytes instead of an entire file? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add asynchronous audio processing #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add asynchronous audio processing #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions