-
Notifications
You must be signed in to change notification settings - Fork 80
tutorial whisper #3760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
SC-Samir
wants to merge
3
commits into
master
Choose a base branch
from
whisper-ai
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
tutorial whisper #3760
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,118 @@ | ||
| --- | ||
| title: Building Speech to Text with Whisper | ||
| logo: openai | ||
| category: ai | ||
| permalink: /tutorials/whisper | ||
| modified_at: 2026-05-26 | ||
| --- | ||
|
|
||
| [Whisper] is an automatic speech recognition model that converts speech to text. It was trained on a large, multilingual audio corpus, which makes it robust to different accents, background noise, and real-world conditions. As an open source model, it is well suited for developers who want to integrate speech to text without depending entirely on a proprietary API. | ||
|
|
||
| Instead of relying on an external SaaS API, Whisper can run directly inside a web application using [faster-whisper], an optimized implementation of the Whisper model that improves inference speed on CPU. | ||
|
|
||
| In this tutorial, a small speech to text demo is deployed on Scalingo using a [FastAPI] backend, a Python web framework, a minimal HTML/JavaScript frontend that records audio in the browser, and `faster-whisper` running on CPU in a single web container. | ||
|
|
||
| ## Planning your Deployment | ||
|
|
||
| For this kind of application, it is recommended to start with an M container and move to a larger size if startup time or inference latency becomes an issue. | ||
|
|
||
| The application supports one environment variables: `MODEL_USE`. A good starting point is to set `MODEL_USE=small`, then move to a larger model only if better accuracy is required. You can view the possibles values of `MODEL_USE` on [faster-whisper] repository. | ||
|
|
||
| ## Deploying the Application | ||
|
|
||
| ### Using the Command Line | ||
|
|
||
| 1. Clone the repository: | ||
|
|
||
| ```bash | ||
| git clone https://github.com/Scalingo/whisper-speech-to-text | ||
| cd whisper-speech-to-text | ||
| ``` | ||
|
|
||
| 2. Create the application on Scalingo: | ||
|
|
||
| ```bash | ||
| scalingo create mywhisper | ||
| ``` | ||
|
|
||
| The Scalingo command line automatically detects the Git repository and | ||
| adds a Git remote pointing to Scalingo: | ||
|
|
||
| ```bash | ||
| git remote -v | ||
|
|
||
| origin https://github.com/Scalingo/whisper-speech-to-text (fetch) | ||
| origin https://github.com/Scalingo/whisper-speech-to-text (push) | ||
| scalingo git@ssh.osc-fr1.scalingo.com:mywhisper.git (fetch) | ||
| scalingo git@ssh.osc-fr1.scalingo.com:mywhisper.git (push) | ||
| ``` | ||
|
|
||
| 3. Configure the application: | ||
|
|
||
| ```bash | ||
| scalingo --app mywhisper env-set MODEL_USE=small | ||
| ``` | ||
|
|
||
| 4. Deploy to Scalingo: | ||
|
|
||
| ```bash | ||
| git push scalingo main | ||
| ``` | ||
|
|
||
| Scalingo detects the Python environment, installs the dependencies declared by the project, and starts the application using the [Procfile]. The speech to text demo is now deployed. | ||
|
|
||
| ## Testing the Deployment | ||
|
|
||
| Before using the application, query the health endpoint to check that the model is loaded: | ||
|
|
||
| ```bash | ||
| curl https://mywhisper.osc-fr1.scalingo.io/health | ||
| ``` | ||
|
|
||
| Since the model is downloaded the first time the container starts, wait until the `status` field is ready before opening the application in a browser and testing the recording from the HTML interface. | ||
|
|
||
| The transcription endpoint can also be tested directly with `curl`.For example, if the audio file is in the current directory of your computer: | ||
|
|
||
| ```bash | ||
| curl --request POST https://mywhisper.osc-fr1.scalingo.io/transcribe \ | ||
| --form "file=@sample.webm" | ||
| ``` | ||
|
|
||
| The backend writes the uploaded file to `/tmp`, transcribes it, then returns a JSON response containing the transcript and model metadata. | ||
|
|
||
| In this demo the transcription runs synchronously. This demo can be adapted to an asynchronous workflow, for example by offloading the transcription to a background job. | ||
|
|
||
| ## Updating the Model | ||
|
|
||
| The application reads the Whisper model name from the `MODEL_USE` environment variable, so changing model size does not require code changes. | ||
|
|
||
| To switch the deployed application to another model, update the variable from the command line: | ||
|
|
||
| ```bash | ||
| scalingo --app mywhisper env-set MODEL_USE=medium | ||
| ``` | ||
|
|
||
| Model names such as `tiny`, `base`, `small`, `medium`, `large-v3`, or `turbo` can be used, depending on the balance required between accuracy, startup time, and CPU usage. | ||
|
|
||
| After changing the variable, restart the application so a new container is started with the updated configuration and the selected model is loaded again at startup: | ||
|
|
||
| ```bash | ||
| scalingo --app mywhisper restart | ||
| ``` | ||
|
|
||
| At the next startup, the application downloads the selected model into the cache directory and warms it in the background before serving transcription requests. | ||
|
|
||
| ## Updating your Application | ||
|
|
||
| To deploy a new version, commit the changes and push again to the Scalingo remote: | ||
|
|
||
| ```bash | ||
| git add . | ||
| git commit -m "Update Whisper demo" | ||
| git push scalingo main | ||
| ``` | ||
|
|
||
| [whisper]: https://github.com/openai/whisper | ||
| [faster-whisper]: https://github.com/SYSTRAN/faster-whisper | ||
| [fastapi]: https://fastapi.tiangolo.com | ||
| [procfile]: {% post_url platform/app/2000-01-01-procfile %} | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: the repository does not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wait for the tutorial to be validated before create the repo, you have it here: https://github.com/SC-Samir/whisper-scalingo