WASdev · jfmartinez · Dec 11, 2025
diff --git a/speech-adapter-samples/text-to-speech/.env.sample b/speech-adapter-samples/text-to-speech/.env.sample
@@ -0,0 +1,6 @@
+## ElevenLabs
+ELEVENLABS_API_KEY='XXXXX'
+
+## Watson TTS Config
+WATSON_TTS_URL='<TTS URL>''
+WATSON_TTS_API_KEY='XXXXXXX'
diff --git a/speech-adapter-samples/text-to-speech/Dockerfile b/speech-adapter-samples/text-to-speech/Dockerfile
@@ -0,0 +1,16 @@
+FROM node:20
+
+# Create app directory
+WORKDIR /usr/src/app
+
+# Install app dependencies
+COPY package.json ./
+
+RUN npm install --only=production
+
+# Bundle app source
+COPY . .
+
+EXPOSE 8010
+
+CMD [ "npm", "start" ]
diff --git a/speech-adapter-samples/text-to-speech/README.md b/speech-adapter-samples/text-to-speech/README.md
@@ -7,55 +7,45 @@ This sample text to speech adapter uses the Watson SDK for Text To Speech found
 By default IBM Voice Gateway uses the Watson Speech services for Text To Speech synthesis, the purpose of this project is to show how a developer can integrate a third party Text To Speech engine with IBM Voice Gateway. This project uses the Watson SDK for Text To Speech as the example for text synthesis.
 
 ## Requires
-- [NodeJS v6 and higher](https://nodejs.org/en/download/)
+- [NodeJS v20 and higher](https://nodejs.org/en/download/)
 - [IBM Voice Gateway](https://www.ibm.com/support/knowledgecenter/SS4U29/deploydocker.html) Setup
 
-## Setup with Watson Text To Speech
+
+## Setup
 1. Clone the Samples Repository
     ```
-    git clone https://github.com/WASdev/sample.voice.gateway.git
-    cd speech-adapter-samples/text-to-speech/
+    git clone https://github.com/jfmartinez/sample.voice.gateway.git -b elevenlabs-tts
+    cd sample.voice.gateway/speech-adapter-samples/text-to-speech/
     ```
 1. Install dependencies
     ```
     npm install
     ```
-1. Add in your credentials, under `config/default.json`:
-    ```json
-    {
-        "Server": {
-            "port": 8010
-        },
-        "WatsonTextToSpeech": {
-            "credentials": {
-                "username": "<username>",
-                "password": "<password>"
-            }
-        }
-    }
+
+1. (Optional) If working with a remote Voice Gateway you can use [ngrok](https://ngrok.com/) to expose your service:
+    ```
+    ngrok http 8010
     ```
 
-    You can also set environment variables, WATSON_TTS_USERNAME and WATSON_TTS_PASSWORD like so:
-    ```bash
-    WATSON_TTS_USERNAME=<username> WATSON_TTS_PASSWORD=<password> npm start
+1. Copy the `.env.sample` file to `.env`.
+    ```
+    cp .env.sample .env
     ```
 
-1. Run the test cases to validate it's working:
+1. Set `ELEVENLABS_API_KEY` in your `.env` file.
 
-    ```bash
-    npm test
-    ```
+1. Run the server with `npm start`
 
-1. Connect the Voice Gateway to this proxy, set the `WATSON_TTS_URL` under the media.relay to point to this sample proxy
+1. Configure the Voice Gateway to connect to the adapter, by setting the `WATSON_TTS_URL` under the media.relay to point to this sample proxy
     ```
-    - WATSON_TTS_URL=http://{hostname}:8010
+    - WATSON_TTS_URL=https://fcea70235af5.ngrok-free.app
     ```
 
 1. Make a call
 
 ### Implement your own Text To Speech Engine
 
-  Currently, this sample only demonstrates how to use Watson Text To Speech as the Text To Speech engine for the Voice Gateway. You can use the `lib/WatsonTextToSpeechEngine.js` as a guideline on how to implement your own Text To Speech Engine. Essentially, you'll be implementing a [Readable NodeJS Stream](http://nodejs.org/api/stream.html#stream_class_stream_readable). Once you implement your own class, you can modify the `lib/TextToSpeechAdapter.js` to `require` it.
+  Currently, this sample only demonstrates how to use Watson Text To Speech as the Text To Speech engine for the Voice Gateway. You can use the `lib/services/WatsonTextToSpeechEngine.js` as a guideline on how to implement your own Text To Speech Engine. Essentially, you'll be implementing a [Readable NodeJS Stream](http://nodejs.org/api/stream.html#stream_class_stream_readable). Once you implement your own class, you can modify the `lib/TextToSpeechAdapter.js` to `require` it.
 
   For example,
 
@@ -142,6 +132,31 @@ By default IBM Voice Gateway uses the Watson Speech services for Text To Speech
   ```
   npm test
   ```
+
+## IBM Cloud Code Engine Deployment
+
+**TBD (Work iN Progress) **
+See [Deploying your app from local source code with the CLI](https://cloud.ibm.com/docs/codeengine?topic=codeengine-app-local-source-code)
+
+Before you begin
+
+1. Set up your [Code Engine CLI](https://cloud.ibm.com/docs/codeengine?topic=codeengine-install-cli) environment.
+2. [Create and work with a project.](https://cloud.ibm.com/docs/codeengine?topic=codeengine-manage-project)
+
+Create and work with a project.
+
+The server comes with the Dockerfile for Code Engine deployment. To deploy the server to Code Engine, please follow the steps below:
+
+**Note:** The steps guide how to push with a `.env` file, but
+1. Build the docker image
+  1. Copy the `.env-sample` file to `.env` and fill in the required information
+  2. Build the image with docker build -t <image-name> . The image name should follow the format of <registry>/<namespace>/<image-name>:<tag>. For example, us.icr.io/testitall_ns/testitall_server:latest.
+2. Push the image to the container registry with:
+```
+ docker push <image-name>
+ ```
+3. Create a Code Engine project and deploy the image
+
 ## License
 
 Licensed under [Apache 2.0 License](https://github.com/WASdev/sample.voice.gateway/blob/master/LICENSE)
diff --git a/speech-adapter-samples/text-to-speech/app.js b/speech-adapter-samples/text-to-speech/app.js
@@ -1,5 +1,5 @@
-const Config = require('config');
+require('dotenv').config();
 
-const PORT = Config.get('Server.port');
+const PORT = process.env.PORT || 8010;
 
-require('./lib/TextToSpeechAdapter').start({ port: PORT });
+require('./lib/ConnectionHandler').start({ port: PORT });
diff --git a/speech-adapter-samples/text-to-speech/config/custom-environment-variables.json b/speech-adapter-samples/text-to-speech/config/custom-environment-variables.json
diff --git a/speech-adapter-samples/text-to-speech/config/default.json b/speech-adapter-samples/text-to-speech/config/default.json
diff --git a/...text-to-speech/lib/TextToSpeechAdapter.js → ...s/text-to-speech/lib/ConnectionHandler.js b/...text-to-speech/lib/TextToSpeechAdapter.js → ...s/text-to-speech/lib/ConnectionHandler.js
@@ -17,14 +17,17 @@ const WebSocketServer = require('ws').Server;
 
 // Change to your own Text to Speech Engine implementation, you can use
 // the WatsonTextToSpeechEngine.js for guidance
-const TextToSpeechEngine = require('./WatsonTextToSpeechEngine');
+const TextToSpeechEngine = require('./services/ElevenLabs');
+
+// Uncomment to enable Watson Text-To-Speech
+// const TextToSpeechEngine = require('./services/WatsonTextToSpeechEngine');
+
 
 const url = require('url');
-const Config = require('config');
 
 const DEFAULT_PORT = 8010;
-const LOG_LEVEL = Config.get('LogLevel');
-const logger = require('pino')({ level: LOG_LEVEL, name: 'TextToSpeechAdapter' });
+// const LOG_LEVEL = Config.get('LogLevel');
+const logger = require('pino')({ level: 'debug', name: 'TextToSpeechAdapter' });
 
 function handleTextToSpeechConnection(webSocket, incomingMessage) {
   logger.debug('connection received');
@@ -35,16 +38,18 @@ function handleTextToSpeechConnection(webSocket, incomingMessage) {
 
   // Get headers
   const { headers } = incomingMessage;
-  logger.trace(headers, 'headers on websocket connection:');
+  logger.debug(headers, 'headers on websocket connection:');
 
   const sessionID = headers['vgw-session-id'];
 
   logger.debug(`connection with session-id: ${sessionID}`);
   let textToSpeechEngine;
-  webSocket.on('message', (data) => {
+  let audioStream;
+  webSocket.on('message', async (data) => {
     if (typeof data === 'string') {
       try {
         const message = JSON.parse(data);
+        logger.info('message starting');
         // Message contains, text and accept
         // Combine the start message with query parameters to generate a config
         const config = Object.assign(queryParams, message);
@@ -54,24 +59,27 @@ function handleTextToSpeechConnection(webSocket, incomingMessage) {
         // NodeJS Stream API
         textToSpeechEngine = new TextToSpeechEngine(config);
 
-        textToSpeechEngine.on('data', (ttsData) => {
+        audioStream = await textToSpeechEngine.synthesize();
+
+        audioStream.on('data', (ttsData) => {
           logger.trace(`data from engine ${ttsData.length}`);
           webSocket.send(ttsData);
         });
 
-        textToSpeechEngine.on('error', (error) => {
+        audioStream.on('error', (error) => {
           logger.error(error, 'TextToSpeechEngine encountered an error: ');
           const errorMessage = {
             error: error.message,
           };
           webSocket.send(JSON.stringify(errorMessage));
         });
 
-        textToSpeechEngine.on('end', (reason = 'No close reason defined') => {
+        audioStream.on('end', (reason = 'No close reason defined') => {
           logger.debug('TextToSpeechEngine closed');
           webSocket.close(1000, reason);
         });
       } catch (e) {
+        // TODO send Error back
         logger.error(e);
         webSocket.close(1000, 'Invalid start message');
       }
@@ -83,9 +91,6 @@ function handleTextToSpeechConnection(webSocket, incomingMessage) {
   // Close event
   webSocket.on('close', (code, reason) => {
     logger.debug(`onClose, code = ${code}, reason = ${reason}`);
-    if (textToSpeechEngine) {
-      textToSpeechEngine.destroy();
-    }
   });
 }
 let wsServer = null;
@@ -95,6 +100,7 @@ function startServer(options = { port: DEFAULT_PORT }) {
     try {
       wsServer = new WebSocketServer({ port: options.port });
     } catch (e) {
+      // eslint-disable-next-line no-promise-executor-return
       return reject(e);
     }
 
@@ -108,7 +114,6 @@ function startServer(options = { port: DEFAULT_PORT }) {
     });
 
     wsServer.on('connection', handleTextToSpeechConnection);
-    return wsServer;
   });
 }
 module.exports.start = startServer;
@@ -128,4 +133,3 @@ function stopServer() {
   });
 }
 module.exports.stop = stopServer;
-
diff --git a/speech-adapter-samples/text-to-speech/lib/services/ElevenLabs.js b/speech-adapter-samples/text-to-speech/lib/services/ElevenLabs.js
@@ -0,0 +1,49 @@
+/**
+* (C) Copyright IBM Corporation 2025.
+*
+* Licensed under the Apache License, Version 2.0 (the "License");
+* you may not use this file except in compliance with the License.
+* You may obtain a copy of the License at
+*
+* http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+const { Readable } = require('stream');
+
+const { ElevenLabsClient } = require('elevenlabs');
+const TextToSpeechAdapter = require('./TextToSpeechAdapter');
+
+const elevenlabs = new ElevenLabsClient({
+  apiKey: process.env.ELEVENLABS_API_KEY,
+});
+
+class ElevenLabsTextToSpeechEngine extends TextToSpeechAdapter {
+  constructor(config = {}) {
+    super();
+    this.config = config;
+  }
+
+  async synthesize() {
+    const audioStream = await elevenlabs.generate({
+      stream: true,
+      voice_id: this.config.voice_id,
+      voice: this.config.voice,
+      text: this.config.text,
+      model_id: this.config.model_id,
+      voice_settings: this.config.voice_settings,
+      // TODO - We need to dynamically pick the output format from the config,
+      // but for now it's likely going to be mulaw
+      output_format: 'ulaw_8000',
+    });
+    const nodeStream = Readable.fromWeb(audioStream);
+
+    return nodeStream;
+  }
+}
+module.exports = ElevenLabsTextToSpeechEngine;
diff --git a/.../text-to-speech/lib/TextToSpeechEngine.js → ...peech/lib/services/TextToSpeechAdapter.js b/.../text-to-speech/lib/TextToSpeechEngine.js → ...peech/lib/services/TextToSpeechAdapter.js
@@ -1,5 +1,5 @@
 /**
-* (C) Copyright IBM Corporation 2018.
+* (C) Copyright IBM Corporation 2025.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
@@ -13,18 +13,15 @@
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
-const { Readable } = require('stream');
-
-class TextToSpeechEngine extends Readable {
-  /* eslint-disable class-methods-use-this */
-  _read() {}
+class TextToSpeechAdapter {
+  constructor(config) {
+    this.config = config;
+  }
 
-  /**
-   * Destroys the Text To Speech Engine if a close from the other side occurs
-   */
   // eslint-disable-next-line class-methods-use-this
-  destroy() {
-    throw new Error('not implemented');
+  async synthesize() {
+    throw new Error('Not implemented');
   }
 }
-module.exports = TextToSpeechEngine;
+
+module.exports = TextToSpeechAdapter;