The Advanced TTS Service is a robust, Node.js-based solution designed to convert textual content into lifelike speech. Leveraging Azure Cognitive Services Speech SDK, this service introduces an innovative approach to processing and synthesizing speech, equipped with features like clipboard monitoring, HTTP server integration, markdown preprocessing, and XML character escaping. It's engineered for extensibility, maintainability, and seamless Azure integration, catering to a wide range of text-to-speech conversion needs.
- Clipboard Monitoring: Employs a sophisticated listener to detect and process text copied to the clipboard, prefixed with a customizable trigger word, facilitating immediate TTS conversion.
- HTTP Server Integration: Features an HTTP server capable of accepting text input via POST requests, enabling programmatic text-to-speech conversion through web services.
- Markdown Preprocessing: Incorporates a markdown preprocessor that converts markdown-formatted text into plain text, optimizing it for speech synthesis.
- XML Character Escaping: Utilizes an XML escaper to ensure text safety for XML/SSML processing, thereby enhancing the versatility and reliability of speech synthesis.
- Queue Management: Implements a TTS queue management system to manage and sequence text inputs for processing, ensuring orderly speech synthesis.
- Azure Cognitive Speech Service Integration: Seamlessly integrates with Azure's TTS service, supporting advanced SSML generation for enriched speech synthesis experiences.
- Audio Playback: Facilitates the playback of synthesized speech audio streams through system speakers, providing an immediate auditory output of the processed text.
.
├── README.md
├── app.js
├── config
│ └── voiceSettings.json
├── package.json
├── pnpm-lock.yaml
└── src
├── controllers
│ └── ttsController.js
├── listeners
│ └── clipboardListener.js
├── preprocessors
│ ├── markdownPreprocessor.js
│ └── xmlEscaper.js
├── services
│ ├── textExtractor.js
│ ├── ttsQueue.js
│ └── ttsServiceAzureAI.js
└── utils
├── logger.js
├── notifier.js
├── playAudioStream.js
└── ssmlGenerator.js
- Node.js (v18.5 or newer recommended)
- An Azure account with an active Cognitive Services Speech subscription
-
Clone the Repository
git clone https://github.com/TheGreenJosip/TTS-package cd TTS-package -
Install Dependencies
Using pnpm:
pnpm install
-
Configure Environment Variables
Populate a
.envfile in the project root with your Azure subscription key, region, and other configurations:SPEECH_KEY=your_subscription_key_here SPEECH_REGION=your_region_here # Optional (defaults shown) TRIGGER_WORD=TTS PORT=4753 # Optional queue hardening MAX_QUEUE_LENGTH=100 TTS_MAX_RETRIES=2 TTS_RETRY_BASE_DELAY_MS=500
Execute the following command to start the TTS service:
node app.jsThis initiates the clipboard monitoring and HTTP server, ready to process text for speech synthesis.
Default base URL:
http://localhost:4753
All endpoints are served from the same base URL (default: http://localhost:4753).
Enqueue text for speech.
curl -X POST http://localhost:4753/tts \
-H "Content-Type: application/json" \
-d '{"text":"Hello, world!"}'Optional voice override (use GET /voices to discover values):
curl -X POST http://localhost:4753/tts \
-H "Content-Type: application/json" \
-d '{"text":"Hello, world!","voice":"en-US-AvaNeural"}'Pause queue processing.
curl -X POST http://localhost:4753/pauseResume queue processing.
curl -X POST http://localhost:4753/resumeClears any queued items. (In-flight audio playback may continue until it finishes.)
curl -X POST http://localhost:4753/stop-ttsReturns available voices.
curl http://localhost:4753/voicesFetches the visible text from a URL and enqueues it.
curl -X POST http://localhost:4753/extract-text \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com"}'Copy any text prefixed with the trigger word (default: "TTS") to the clipboard. The service will automatically detect, process, and convert the text to speech.
The config/voiceSettings.json file allows for detailed customization of voice and speech patterns. Adjust settings here to tailor the TTS output to your preferences.
The frontend/ folder contains a small React/Vite dashboard for convenience.
- Create
frontend/.envbased onfrontend/.env.example - Set:
VITE_API_BASE_URL=http://localhost:4753
Contributions are welcome! Please refer to the contributing guidelines for more details on how to participate in the project's development.
To run the service in the background, consider using pm2:
pm2 start app.js --name tts-serviceManage the service with pm2 stop tts-service and pm2 start tts-service.
This project is licensed under the MIT License. See the LICENSE file for details.
- Sophisticated Introduction: The introduction now clearly outlines the service's capabilities and its integration with Azure Cognitive Services, setting a professional tone.
- Detailed Feature Descriptions: Each feature is described in detail, highlighting the service's functionality and technical sophistication.
- Comprehensive Project Structure: The updated project structure reflects the latest changes, providing clarity on the organization and modularity of the codebase.
- Streamlined Getting Started Section: The installation and usage instructions are concise, making it easy for users to get the service up and running.
- Advanced Configuration: A brief mention of advanced configuration options encourages users to explore and customize the service further.
- Professional Tone: Throughout the document, the language and structure aim to communicate a high level of professionalism and attention to detail, targeting an audience of senior developers and technical users.