From e04885bef7176ea365016e32a01af68ce3b7397a Mon Sep 17 00:00:00 2001 From: SC-Samir Date: Tue, 26 May 2026 15:02:21 +0200 Subject: [PATCH 1/3] tutorial whisper --- src/_includes/icons/openai.svg | 1 + src/_tutorials/whisper/index.md | 112 ++++++++++++++++++++++++++++++++ 2 files changed, 113 insertions(+) create mode 100644 src/_includes/icons/openai.svg create mode 100644 src/_tutorials/whisper/index.md diff --git a/src/_includes/icons/openai.svg b/src/_includes/icons/openai.svg new file mode 100644 index 000000000..eea5a3aa8 --- /dev/null +++ b/src/_includes/icons/openai.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/src/_tutorials/whisper/index.md b/src/_tutorials/whisper/index.md new file mode 100644 index 000000000..490bc9b77 --- /dev/null +++ b/src/_tutorials/whisper/index.md @@ -0,0 +1,112 @@ +--- +title: Building speech to text with Whisper +logo: openai +category: ai +permalink: /tutorials/whisper +modified_at: 2026-05-26 +--- + +Whisper is an automatic speech recognition model that converts speech to text. It was trained on a large, multilingual audio corpus, which makes it robust to different accents, background noise, and real-world conditions. As an open-source model, it is well suited for developers who want to integrate speech-to-text without depending entirely on a proprietary API. + +Instead of relying on an external SaaS API, Whisper can run directly inside a web application using `faster-whisper`. This implementation keeps the same model family while improving inference speed and reducing resource usage. + +In this tutorial, a small speech-to-text demo is deployed on Scalingo using a FastAPI backend, a minimal HTML/JavaScript frontend that records audio in the browser, and `faster-whisper` running on CPU in a single web container. + +## Planning your deployment + +For this kind of application, it is recommended to start with an M container and move to a larger size if startup time or inference latency becomes an issue. The application warms the model in the background at startup and stores downloaded model files under `/tmp/models`. + +The application supports two environment variables: `MODEL_USE`, which defaults to `small`, and `MODEL_CACHE_DIR`, which defaults to `/tmp/models`. Starting with `MODEL_USE=small` is a good default, then moving to a larger model only if better accuracy is required. + +## Deploying the application + +### Using the command line + +1. Clone the repository: + + ```bash + git clone https://github.com/Scalingo/whisper-speech-to-text + cd whisper-speech-to-text + ``` + +2. Create the application on Scalingo: + + ```bash + scalingo create whisper-speech-to-text + ``` + + The Scalingo command line automatically detects the Git repository and + adds a Git remote pointing to Scalingo: + + ```bash + git remote -v + + origin https://github.com/Scalingo/whisper-speech-to-text (fetch) + origin https://github.com/Scalingo/whisper-speech-to-text (push) + scalingo git@ssh.osc-fr1.scalingo.com:whisper-speech-to-text.git (fetch) + scalingo git@ssh.osc-fr1.scalingo.com:whisper-speech-to-text.git (push) + ``` + +3. Configure the application: + + ```bash + scalingo --app whisper-speech-to-text env-set MODEL_USE=small + scalingo --app whisper-speech-to-text env-set MODEL_CACHE_DIR=/tmp/models + ``` + +4. Deploy to Scalingo: + + ```bash + git push scalingo main + ``` + + Scalingo detects the Python environment, installs the dependencies declared by the project, and starts the application using the `Procfile`. The speech-to-text demo is now deployed. + +## Testing the deployment + +Before using the application, check the health endpoint to verify that the model is loaded: + + ```bash + curl https://whisper-speech-to-text.osc-fr1.scalingo.io/health + ``` + +Once the model is ready, open the application in a browser and test recording from the HTML interface. The transcription endpoint can also be tested directly with `curl`: + + ```bash + curl -X POST https://whisper-speech-to-text.osc-fr1.scalingo.io/transcribe \ + -F "file=@sample.webm" + ``` + +The backend writes the uploaded file to `/tmp`, transcribes it, then returns a JSON response containing the transcript and model metadata. + +## Updating the model + +The application reads the Whisper model name from the `MODEL_USE` environment variable, so changing model size does not require code changes. + +To switch the deployed application to another model, update the variable from the command line: + + ```bash + scalingo --app whisper-speech-to-text env-set MODEL_USE=medium + ``` + +Model names such as `tiny`, `base`, `small`, `medium`, `large-v3`, or `turbo` can be used, depending on the balance required between accuracy, startup time, and CPU usage. + +After changing the variable, restart the application so the web process reloads the selected model: + + ```bash + scalingo --app whisper-speech-to-text restart + ``` + +At the next startup, the application downloads or reloads the selected model into the cache directory and warms it in the background before serving transcription requests. + +## Updating your application + +To deploy a new version, commit the changes and push again to the Scalingo remote: + + ```bash + git add . + git commit -m "Update Whisper demo" + git push scalingo main + ``` + +If the frontend template, model settings, or Python dependencies change, redeploying is enough for Scalingo to rebuild and restart the application with the new version. From 631eaaca5d6d01128ecdd289cef06ca8f7e59ce2 Mon Sep 17 00:00:00 2001 From: SC-Samir Date: Thu, 28 May 2026 12:56:43 +0200 Subject: [PATCH 2/3] Update the tutorial with Etienne advice --- src/_tutorials/whisper/index.md | 64 ++++++++++++++++++--------------- 1 file changed, 35 insertions(+), 29 deletions(-) diff --git a/src/_tutorials/whisper/index.md b/src/_tutorials/whisper/index.md index 490bc9b77..78d2c1977 100644 --- a/src/_tutorials/whisper/index.md +++ b/src/_tutorials/whisper/index.md @@ -1,26 +1,26 @@ --- -title: Building speech to text with Whisper +title: Building Speech to Text with Whisper logo: openai category: ai permalink: /tutorials/whisper modified_at: 2026-05-26 --- -Whisper is an automatic speech recognition model that converts speech to text. It was trained on a large, multilingual audio corpus, which makes it robust to different accents, background noise, and real-world conditions. As an open-source model, it is well suited for developers who want to integrate speech-to-text without depending entirely on a proprietary API. +[Whisper] is an automatic speech recognition model that converts speech to text. It was trained on a large, multilingual audio corpus, which makes it robust to different accents, background noise, and real-world conditions. As an open source model, it is well suited for developers who want to integrate speech to text without depending entirely on a proprietary API. -Instead of relying on an external SaaS API, Whisper can run directly inside a web application using `faster-whisper`. This implementation keeps the same model family while improving inference speed and reducing resource usage. +Instead of relying on an external SaaS API, Whisper can run directly inside a web application using [faster-whisper], an optimized implementation of the Whisper model that improving inference speed on CPU. -In this tutorial, a small speech-to-text demo is deployed on Scalingo using a FastAPI backend, a minimal HTML/JavaScript frontend that records audio in the browser, and `faster-whisper` running on CPU in a single web container. +In this tutorial, a small speech to text demo is deployed on Scalingo using a [FastAPI] backend, a Python web framework, a minimal HTML/JavaScript frontend that records audio in the browser, and `faster-whisper` running on CPU in a single web container. -## Planning your deployment +## Planning your Deployment -For this kind of application, it is recommended to start with an M container and move to a larger size if startup time or inference latency becomes an issue. The application warms the model in the background at startup and stores downloaded model files under `/tmp/models`. +For this kind of application, it is recommended to start with an M container and move to a larger size if startup time or inference latency becomes an issue. -The application supports two environment variables: `MODEL_USE`, which defaults to `small`, and `MODEL_CACHE_DIR`, which defaults to `/tmp/models`. Starting with `MODEL_USE=small` is a good default, then moving to a larger model only if better accuracy is required. +The application supports two environment variables: `MODEL_USE` and `MODEL_CACHE_DIR`. A good starting point is to set `MODEL_USE=small` and `MODEL_CACHE_DIR=/tmp/models`, then move to a larger model only if better accuracy is required. You can view the possibles values of `MODEL_USE` on [faster-whisper] repository. -## Deploying the application +## Deploying the Application -### Using the command line +### Using the Command Line 1. Clone the repository: @@ -32,7 +32,7 @@ The application supports two environment variables: `MODEL_USE`, which defaults 2. Create the application on Scalingo: ```bash - scalingo create whisper-speech-to-text + scalingo create mywhisper ``` The Scalingo command line automatically detects the Git repository and @@ -43,15 +43,15 @@ The application supports two environment variables: `MODEL_USE`, which defaults origin https://github.com/Scalingo/whisper-speech-to-text (fetch) origin https://github.com/Scalingo/whisper-speech-to-text (push) - scalingo git@ssh.osc-fr1.scalingo.com:whisper-speech-to-text.git (fetch) - scalingo git@ssh.osc-fr1.scalingo.com:whisper-speech-to-text.git (push) + scalingo git@ssh.osc-fr1.scalingo.com:mywhisper.git (fetch) + scalingo git@ssh.osc-fr1.scalingo.com:mywhisper.git (push) ``` 3. Configure the application: ```bash - scalingo --app whisper-speech-to-text env-set MODEL_USE=small - scalingo --app whisper-speech-to-text env-set MODEL_CACHE_DIR=/tmp/models + scalingo --app mywhisper env-set MODEL_USE=small + scalingo --app mywhisper env-set MODEL_CACHE_DIR=/tmp/models ``` 4. Deploy to Scalingo: @@ -60,46 +60,50 @@ The application supports two environment variables: `MODEL_USE`, which defaults git push scalingo main ``` - Scalingo detects the Python environment, installs the dependencies declared by the project, and starts the application using the `Procfile`. The speech-to-text demo is now deployed. + Scalingo detects the Python environment, installs the dependencies declared by the project, and starts the application using the `Procfile`. The speech to text demo is now deployed. -## Testing the deployment +## Testing the Deployment -Before using the application, check the health endpoint to verify that the model is loaded: +Before using the application, query the health endpoint to check that the model is loaded: ```bash - curl https://whisper-speech-to-text.osc-fr1.scalingo.io/health + curl https://mywhisper.osc-fr1.scalingo.io/health ``` -Once the model is ready, open the application in a browser and test recording from the HTML interface. The transcription endpoint can also be tested directly with `curl`: +Since the model is downloaded the first time the container starts, wait until the `status` field is ready before opening the application in a browser and testing recording from the HTML interface. + +The transcription endpoint can also be tested directly with **curl**.For example, if the audio file is in the current directory of your computer: ```bash - curl -X POST https://whisper-speech-to-text.osc-fr1.scalingo.io/transcribe \ - -F "file=@sample.webm" + curl --request POST https://mywhisper.osc-fr1.scalingo.io/transcribe \ + --form "file=@sample.webm" ``` -The backend writes the uploaded file to `/tmp`, transcribes it, then returns a JSON response containing the transcript and model metadata. +The backend writes the uploaded file to `/tmp`, transcribes it, then returns a JSON response containing the transcript and model metadata. + +In this demo the transcription runs synchronously, but this demo can be adapted to an asynchronous workflow, for example by offloading the transcription to a background job. -## Updating the model +## Updating the Model The application reads the Whisper model name from the `MODEL_USE` environment variable, so changing model size does not require code changes. To switch the deployed application to another model, update the variable from the command line: ```bash - scalingo --app whisper-speech-to-text env-set MODEL_USE=medium + scalingo --app mywhisper env-set MODEL_USE=medium ``` Model names such as `tiny`, `base`, `small`, `medium`, `large-v3`, or `turbo` can be used, depending on the balance required between accuracy, startup time, and CPU usage. -After changing the variable, restart the application so the web process reloads the selected model: +After changing the variable, restart the application so a new container is started with the updated configuration and the selected model is loaded again at startup: ```bash - scalingo --app whisper-speech-to-text restart + scalingo --app mywhisper restart ``` -At the next startup, the application downloads or reloads the selected model into the cache directory and warms it in the background before serving transcription requests. +At the next startup, the application downloads the selected model into the cache directory and warms it in the background before serving transcription requests. -## Updating your application +## Updating your Application To deploy a new version, commit the changes and push again to the Scalingo remote: @@ -109,4 +113,6 @@ To deploy a new version, commit the changes and push again to the Scalingo remot git push scalingo main ``` -If the frontend template, model settings, or Python dependencies change, redeploying is enough for Scalingo to rebuild and restart the application with the new version. +[whisper]: https://github.com/openai/whisper +[faster-whisper]: https://github.com/SYSTRAN/faster-whisper +[fastapi]: https://fastapi.tiangolo.com From 9411c94b2bcec168e512629f83b8723df0cc1a78 Mon Sep 17 00:00:00 2001 From: SC-Samir Date: Thu, 28 May 2026 14:57:26 +0200 Subject: [PATCH 3/3] Solve issue --- src/_tutorials/whisper/index.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/_tutorials/whisper/index.md b/src/_tutorials/whisper/index.md index 78d2c1977..6df08157e 100644 --- a/src/_tutorials/whisper/index.md +++ b/src/_tutorials/whisper/index.md @@ -8,7 +8,7 @@ modified_at: 2026-05-26 [Whisper] is an automatic speech recognition model that converts speech to text. It was trained on a large, multilingual audio corpus, which makes it robust to different accents, background noise, and real-world conditions. As an open source model, it is well suited for developers who want to integrate speech to text without depending entirely on a proprietary API. -Instead of relying on an external SaaS API, Whisper can run directly inside a web application using [faster-whisper], an optimized implementation of the Whisper model that improving inference speed on CPU. +Instead of relying on an external SaaS API, Whisper can run directly inside a web application using [faster-whisper], an optimized implementation of the Whisper model that improves inference speed on CPU. In this tutorial, a small speech to text demo is deployed on Scalingo using a [FastAPI] backend, a Python web framework, a minimal HTML/JavaScript frontend that records audio in the browser, and `faster-whisper` running on CPU in a single web container. @@ -16,7 +16,7 @@ In this tutorial, a small speech to text demo is deployed on Scalingo using a [F For this kind of application, it is recommended to start with an M container and move to a larger size if startup time or inference latency becomes an issue. -The application supports two environment variables: `MODEL_USE` and `MODEL_CACHE_DIR`. A good starting point is to set `MODEL_USE=small` and `MODEL_CACHE_DIR=/tmp/models`, then move to a larger model only if better accuracy is required. You can view the possibles values of `MODEL_USE` on [faster-whisper] repository. +The application supports one environment variables: `MODEL_USE`. A good starting point is to set `MODEL_USE=small`, then move to a larger model only if better accuracy is required. You can view the possibles values of `MODEL_USE` on [faster-whisper] repository. ## Deploying the Application @@ -51,7 +51,6 @@ The application supports two environment variables: `MODEL_USE` and `MODEL_CACHE ```bash scalingo --app mywhisper env-set MODEL_USE=small - scalingo --app mywhisper env-set MODEL_CACHE_DIR=/tmp/models ``` 4. Deploy to Scalingo: @@ -60,7 +59,7 @@ The application supports two environment variables: `MODEL_USE` and `MODEL_CACHE git push scalingo main ``` - Scalingo detects the Python environment, installs the dependencies declared by the project, and starts the application using the `Procfile`. The speech to text demo is now deployed. + Scalingo detects the Python environment, installs the dependencies declared by the project, and starts the application using the [Procfile]. The speech to text demo is now deployed. ## Testing the Deployment @@ -70,9 +69,9 @@ Before using the application, query the health endpoint to check that the model curl https://mywhisper.osc-fr1.scalingo.io/health ``` -Since the model is downloaded the first time the container starts, wait until the `status` field is ready before opening the application in a browser and testing recording from the HTML interface. +Since the model is downloaded the first time the container starts, wait until the `status` field is ready before opening the application in a browser and testing the recording from the HTML interface. -The transcription endpoint can also be tested directly with **curl**.For example, if the audio file is in the current directory of your computer: +The transcription endpoint can also be tested directly with `curl`.For example, if the audio file is in the current directory of your computer: ```bash curl --request POST https://mywhisper.osc-fr1.scalingo.io/transcribe \ @@ -81,7 +80,7 @@ The transcription endpoint can also be tested directly with **curl**.For example The backend writes the uploaded file to `/tmp`, transcribes it, then returns a JSON response containing the transcript and model metadata. -In this demo the transcription runs synchronously, but this demo can be adapted to an asynchronous workflow, for example by offloading the transcription to a background job. +In this demo the transcription runs synchronously. This demo can be adapted to an asynchronous workflow, for example by offloading the transcription to a background job. ## Updating the Model @@ -116,3 +115,4 @@ To deploy a new version, commit the changes and push again to the Scalingo remot [whisper]: https://github.com/openai/whisper [faster-whisper]: https://github.com/SYSTRAN/faster-whisper [fastapi]: https://fastapi.tiangolo.com +[procfile]: {% post_url platform/app/2000-01-01-procfile %}