Advanced Text-to-Speech (TTS) Service

The Advanced TTS Service is a robust, Node.js-based solution designed to convert textual content into lifelike speech. Leveraging Azure Cognitive Services Speech SDK, this service introduces an innovative approach to processing and synthesizing speech, equipped with features like clipboard monitoring, HTTP server integration, markdown preprocessing, and XML character escaping. It's engineered for extensibility, maintainability, and seamless Azure integration, catering to a wide range of text-to-speech conversion needs.

Core Features

Clipboard Monitoring: Employs a sophisticated listener to detect and process text copied to the clipboard, prefixed with a customizable trigger word, facilitating immediate TTS conversion.
HTTP Server Integration: Features an HTTP server capable of accepting text input via POST requests, enabling programmatic text-to-speech conversion through web services.
Markdown Preprocessing: Incorporates a markdown preprocessor that converts markdown-formatted text into plain text, optimizing it for speech synthesis.
XML Character Escaping: Utilizes an XML escaper to ensure text safety for XML/SSML processing, thereby enhancing the versatility and reliability of speech synthesis.
Queue Management: Implements a TTS queue management system to manage and sequence text inputs for processing, ensuring orderly speech synthesis.
Azure Cognitive Speech Service Integration: Seamlessly integrates with Azure's TTS service, supporting advanced SSML generation for enriched speech synthesis experiences.
Audio Playback: Facilitates the playback of synthesized speech audio streams through system speakers, providing an immediate auditory output of the processed text.

Project Structure

.
├── README.md
├── app.js
├── config
│   └── voiceSettings.json
├── package.json
├── pnpm-lock.yaml
└── src
    ├── controllers
    │   └── ttsController.js
    ├── listeners
    │   └── clipboardListener.js
    ├── preprocessors
    │   ├── markdownPreprocessor.js
    │   └── xmlEscaper.js
    ├── services
    │   ├── textExtractor.js
    │   ├── ttsQueue.js
    │   └── ttsServiceAzureAI.js
    └── utils
        ├── logger.js
        ├── notifier.js
        ├── playAudioStream.js
        └── ssmlGenerator.js

Getting Started

Prerequisites

Node.js (v18.5 or newer recommended)
An Azure account with an active Cognitive Services Speech subscription

Installation

Clone the Repository

git clone https://github.com/TheGreenJosip/TTS-package
cd TTS-package

Install Dependencies

Using pnpm:
```
pnpm install
```

Configure Environment Variables

Populate a .env file in the project root with your Azure subscription key, region, and other configurations:

SPEECH_KEY=your_subscription_key_here
SPEECH_REGION=your_region_here
# Optional (defaults shown)
TRIGGER_WORD=TTS
PORT=4753

# Optional queue hardening
MAX_QUEUE_LENGTH=100
TTS_MAX_RETRIES=2
TTS_RETRY_BASE_DELAY_MS=500

Usage

Starting the Service

Execute the following command to start the TTS service:

node app.js

This initiates the clipboard monitoring and HTTP server, ready to process text for speech synthesis.

Default base URL:

http://localhost:4753

HTTP API

All endpoints are served from the same base URL (default: http://localhost:4753).

`POST /tts`

Enqueue text for speech.

curl -X POST http://localhost:4753/tts \
   -H "Content-Type: application/json" \
   -d '{"text":"Hello, world!"}'

Optional voice override (use GET /voices to discover values):

curl -X POST http://localhost:4753/tts \
   -H "Content-Type: application/json" \
   -d '{"text":"Hello, world!","voice":"en-US-AvaNeural"}'

`POST /pause`

Pause queue processing.

curl -X POST http://localhost:4753/pause

`POST /resume`

Resume queue processing.

curl -X POST http://localhost:4753/resume

`POST /stop-tts`

Clears any queued items. (In-flight audio playback may continue until it finishes.)

curl -X POST http://localhost:4753/stop-tts

`GET /voices`

Returns available voices.

curl http://localhost:4753/voices

`POST /extract-text`

Fetches the visible text from a URL and enqueues it.

curl -X POST http://localhost:4753/extract-text \
   -H "Content-Type: application/json" \
   -d '{"url":"https://example.com"}'

Clipboard Interaction

Copy any text prefixed with the trigger word (default: "TTS") to the clipboard. The service will automatically detect, process, and convert the text to speech.

Advanced Configuration

The config/voiceSettings.json file allows for detailed customization of voice and speech patterns. Adjust settings here to tailor the TTS output to your preferences.

Optional Frontend Dashboard

The frontend/ folder contains a small React/Vite dashboard for convenience.

Create frontend/.env based on frontend/.env.example
Set:
- VITE_API_BASE_URL=http://localhost:4753

Contributing

Contributions are welcome! Please refer to the contributing guidelines for more details on how to participate in the project's development.

Running in Background

To run the service in the background, consider using pm2:

pm2 start app.js --name tts-service

Manage the service with pm2 stop tts-service and pm2 start tts-service.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Key Points

Sophisticated Introduction: The introduction now clearly outlines the service's capabilities and its integration with Azure Cognitive Services, setting a professional tone.
Detailed Feature Descriptions: Each feature is described in detail, highlighting the service's functionality and technical sophistication.
Comprehensive Project Structure: The updated project structure reflects the latest changes, providing clarity on the organization and modularity of the codebase.
Streamlined Getting Started Section: The installation and usage instructions are concise, making it easy for users to get the service up and running.
Advanced Configuration: A brief mention of advanced configuration options encourages users to explore and customize the service further.
Professional Tone: Throughout the document, the language and structure aim to communicate a high level of professionalism and attention to detail, targeting an audience of senior developers and technical users.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced Text-to-Speech (TTS) Service

Core Features

Project Structure

Getting Started

Prerequisites

Installation

Usage

Starting the Service

HTTP API

`POST /tts`

`POST /pause`

`POST /resume`

`POST /stop-tts`

`GET /voices`

`POST /extract-text`

Clipboard Interaction

Advanced Configuration

Optional Frontend Dashboard

Contributing

Running in Background

License

Key Points

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode		.vscode
config		config
frontend		frontend
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.js		app.js
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml

Folders and files

Latest commit

History

Repository files navigation

Advanced Text-to-Speech (TTS) Service

Core Features

Project Structure

Getting Started

Prerequisites

Installation

Usage

Starting the Service

HTTP API

POST /tts

POST /pause

POST /resume

POST /stop-tts

GET /voices

POST /extract-text

Clipboard Interaction

Advanced Configuration

Optional Frontend Dashboard

Contributing

Running in Background

License

Key Points

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /tts`

`POST /pause`

`POST /resume`

`POST /stop-tts`

`GET /voices`

`POST /extract-text`

Packages