Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions speech-adapter-samples/text-to-speech/.env.sample
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
## ElevenLabs
ELEVENLABS_API_KEY='XXXXX'

## Watson TTS Config
WATSON_TTS_URL='<TTS URL>''
WATSON_TTS_API_KEY='XXXXXXX'
16 changes: 16 additions & 0 deletions speech-adapter-samples/text-to-speech/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM node:20

# Create app directory
WORKDIR /usr/src/app

# Install app dependencies
COPY package.json ./

RUN npm install --only=production

# Bundle app source
COPY . .

EXPOSE 8010

CMD [ "npm", "start" ]
69 changes: 42 additions & 27 deletions speech-adapter-samples/text-to-speech/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,55 +7,45 @@ This sample text to speech adapter uses the Watson SDK for Text To Speech found
By default IBM Voice Gateway uses the Watson Speech services for Text To Speech synthesis, the purpose of this project is to show how a developer can integrate a third party Text To Speech engine with IBM Voice Gateway. This project uses the Watson SDK for Text To Speech as the example for text synthesis.

## Requires
- [NodeJS v6 and higher](https://nodejs.org/en/download/)
- [NodeJS v20 and higher](https://nodejs.org/en/download/)
- [IBM Voice Gateway](https://www.ibm.com/support/knowledgecenter/SS4U29/deploydocker.html) Setup

## Setup with Watson Text To Speech

## Setup
1. Clone the Samples Repository
```
git clone https://github.com/WASdev/sample.voice.gateway.git
cd speech-adapter-samples/text-to-speech/
git clone https://github.com/jfmartinez/sample.voice.gateway.git -b elevenlabs-tts
cd sample.voice.gateway/speech-adapter-samples/text-to-speech/
```
1. Install dependencies
```
npm install
```
1. Add in your credentials, under `config/default.json`:
```json
{
"Server": {
"port": 8010
},
"WatsonTextToSpeech": {
"credentials": {
"username": "<username>",
"password": "<password>"
}
}
}

1. (Optional) If working with a remote Voice Gateway you can use [ngrok](https://ngrok.com/) to expose your service:
```
ngrok http 8010
```

You can also set environment variables, WATSON_TTS_USERNAME and WATSON_TTS_PASSWORD like so:
```bash
WATSON_TTS_USERNAME=<username> WATSON_TTS_PASSWORD=<password> npm start
1. Copy the `.env.sample` file to `.env`.
```
cp .env.sample .env
```

1. Run the test cases to validate it's working:
1. Set `ELEVENLABS_API_KEY` in your `.env` file.

```bash
npm test
```
1. Run the server with `npm start`

1. Connect the Voice Gateway to this proxy, set the `WATSON_TTS_URL` under the media.relay to point to this sample proxy
1. Configure the Voice Gateway to connect to the adapter, by setting the `WATSON_TTS_URL` under the media.relay to point to this sample proxy
```
- WATSON_TTS_URL=http://{hostname}:8010
- WATSON_TTS_URL=https://fcea70235af5.ngrok-free.app
```

1. Make a call

### Implement your own Text To Speech Engine

Currently, this sample only demonstrates how to use Watson Text To Speech as the Text To Speech engine for the Voice Gateway. You can use the `lib/WatsonTextToSpeechEngine.js` as a guideline on how to implement your own Text To Speech Engine. Essentially, you'll be implementing a [Readable NodeJS Stream](http://nodejs.org/api/stream.html#stream_class_stream_readable). Once you implement your own class, you can modify the `lib/TextToSpeechAdapter.js` to `require` it.
Currently, this sample only demonstrates how to use Watson Text To Speech as the Text To Speech engine for the Voice Gateway. You can use the `lib/services/WatsonTextToSpeechEngine.js` as a guideline on how to implement your own Text To Speech Engine. Essentially, you'll be implementing a [Readable NodeJS Stream](http://nodejs.org/api/stream.html#stream_class_stream_readable). Once you implement your own class, you can modify the `lib/TextToSpeechAdapter.js` to `require` it.

For example,

Expand Down Expand Up @@ -142,6 +132,31 @@ By default IBM Voice Gateway uses the Watson Speech services for Text To Speech
```
npm test
```

## IBM Cloud Code Engine Deployment

**TBD (Work iN Progress) **
See [Deploying your app from local source code with the CLI](https://cloud.ibm.com/docs/codeengine?topic=codeengine-app-local-source-code)

Before you begin

1. Set up your [Code Engine CLI](https://cloud.ibm.com/docs/codeengine?topic=codeengine-install-cli) environment.
2. [Create and work with a project.](https://cloud.ibm.com/docs/codeengine?topic=codeengine-manage-project)

Create and work with a project.

The server comes with the Dockerfile for Code Engine deployment. To deploy the server to Code Engine, please follow the steps below:

**Note:** The steps guide how to push with a `.env` file, but
1. Build the docker image
1. Copy the `.env-sample` file to `.env` and fill in the required information
2. Build the image with docker build -t <image-name> . The image name should follow the format of <registry>/<namespace>/<image-name>:<tag>. For example, us.icr.io/testitall_ns/testitall_server:latest.
2. Push the image to the container registry with:
```
docker push <image-name>
```
3. Create a Code Engine project and deploy the image

## License

Licensed under [Apache 2.0 License](https://github.com/WASdev/sample.voice.gateway/blob/master/LICENSE)
6 changes: 3 additions & 3 deletions speech-adapter-samples/text-to-speech/app.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
const Config = require('config');
require('dotenv').config();

const PORT = Config.get('Server.port');
const PORT = process.env.PORT || 8010;

require('./lib/TextToSpeechAdapter').start({ port: PORT });
require('./lib/ConnectionHandler').start({ port: PORT });

This file was deleted.

12 changes: 0 additions & 12 deletions speech-adapter-samples/text-to-speech/config/default.json

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,17 @@ const WebSocketServer = require('ws').Server;

// Change to your own Text to Speech Engine implementation, you can use
// the WatsonTextToSpeechEngine.js for guidance
const TextToSpeechEngine = require('./WatsonTextToSpeechEngine');
const TextToSpeechEngine = require('./services/ElevenLabs');

// Uncomment to enable Watson Text-To-Speech
// const TextToSpeechEngine = require('./services/WatsonTextToSpeechEngine');


const url = require('url');
const Config = require('config');

const DEFAULT_PORT = 8010;
const LOG_LEVEL = Config.get('LogLevel');
const logger = require('pino')({ level: LOG_LEVEL, name: 'TextToSpeechAdapter' });
// const LOG_LEVEL = Config.get('LogLevel');
const logger = require('pino')({ level: 'debug', name: 'TextToSpeechAdapter' });

function handleTextToSpeechConnection(webSocket, incomingMessage) {
logger.debug('connection received');
Expand All @@ -35,16 +38,18 @@ function handleTextToSpeechConnection(webSocket, incomingMessage) {

// Get headers
const { headers } = incomingMessage;
logger.trace(headers, 'headers on websocket connection:');
logger.debug(headers, 'headers on websocket connection:');

const sessionID = headers['vgw-session-id'];

logger.debug(`connection with session-id: ${sessionID}`);
let textToSpeechEngine;
webSocket.on('message', (data) => {
let audioStream;
webSocket.on('message', async (data) => {
if (typeof data === 'string') {
try {
const message = JSON.parse(data);
logger.info('message starting');
// Message contains, text and accept
// Combine the start message with query parameters to generate a config
const config = Object.assign(queryParams, message);
Expand All @@ -54,24 +59,27 @@ function handleTextToSpeechConnection(webSocket, incomingMessage) {
// NodeJS Stream API
textToSpeechEngine = new TextToSpeechEngine(config);

textToSpeechEngine.on('data', (ttsData) => {
audioStream = await textToSpeechEngine.synthesize();

audioStream.on('data', (ttsData) => {
logger.trace(`data from engine ${ttsData.length}`);
webSocket.send(ttsData);
});

textToSpeechEngine.on('error', (error) => {
audioStream.on('error', (error) => {
logger.error(error, 'TextToSpeechEngine encountered an error: ');
const errorMessage = {
error: error.message,
};
webSocket.send(JSON.stringify(errorMessage));
});

textToSpeechEngine.on('end', (reason = 'No close reason defined') => {
audioStream.on('end', (reason = 'No close reason defined') => {
logger.debug('TextToSpeechEngine closed');
webSocket.close(1000, reason);
});
} catch (e) {
// TODO send Error back
logger.error(e);
webSocket.close(1000, 'Invalid start message');
}
Expand All @@ -83,9 +91,6 @@ function handleTextToSpeechConnection(webSocket, incomingMessage) {
// Close event
webSocket.on('close', (code, reason) => {
logger.debug(`onClose, code = ${code}, reason = ${reason}`);
if (textToSpeechEngine) {
textToSpeechEngine.destroy();
}
});
}
let wsServer = null;
Expand All @@ -95,6 +100,7 @@ function startServer(options = { port: DEFAULT_PORT }) {
try {
wsServer = new WebSocketServer({ port: options.port });
} catch (e) {
// eslint-disable-next-line no-promise-executor-return
return reject(e);
}

Expand All @@ -108,7 +114,6 @@ function startServer(options = { port: DEFAULT_PORT }) {
});

wsServer.on('connection', handleTextToSpeechConnection);
return wsServer;
});
}
module.exports.start = startServer;
Expand All @@ -128,4 +133,3 @@ function stopServer() {
});
}
module.exports.stop = stopServer;

49 changes: 49 additions & 0 deletions speech-adapter-samples/text-to-speech/lib/services/ElevenLabs.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
/**
* (C) Copyright IBM Corporation 2025.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

const { Readable } = require('stream');

const { ElevenLabsClient } = require('elevenlabs');
const TextToSpeechAdapter = require('./TextToSpeechAdapter');

const elevenlabs = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
});

class ElevenLabsTextToSpeechEngine extends TextToSpeechAdapter {
constructor(config = {}) {
super();
this.config = config;
}

async synthesize() {
const audioStream = await elevenlabs.generate({
stream: true,
voice_id: this.config.voice_id,
voice: this.config.voice,
text: this.config.text,
model_id: this.config.model_id,
voice_settings: this.config.voice_settings,
// TODO - We need to dynamically pick the output format from the config,
// but for now it's likely going to be mulaw
output_format: 'ulaw_8000',
});
const nodeStream = Readable.fromWeb(audioStream);

return nodeStream;
}
}
module.exports = ElevenLabsTextToSpeechEngine;
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* (C) Copyright IBM Corporation 2018.
* (C) Copyright IBM Corporation 2025.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -13,18 +13,15 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
const { Readable } = require('stream');

class TextToSpeechEngine extends Readable {
/* eslint-disable class-methods-use-this */
_read() {}
class TextToSpeechAdapter {
constructor(config) {
this.config = config;
}

/**
* Destroys the Text To Speech Engine if a close from the other side occurs
*/
// eslint-disable-next-line class-methods-use-this
destroy() {
throw new Error('not implemented');
async synthesize() {
throw new Error('Not implemented');
}
}
module.exports = TextToSpeechEngine;

module.exports = TextToSpeechAdapter;
Loading