Skip to content

VesselWave/go-voice-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Go Voice App

A voice agent application that integrates SIP/RTP with LLMs (Large Language Models), TTS (Text-to-Speech), and STT (Speech-to-Text).

Features

  • SIP/RTP Integration: Handles VoIP calls using sipgo and pion/sdp.
  • Speech-to-Text (STT): Uses Whisper for high-accuracy speech recognition.
  • Text-to-Speech (TTS): Uses Piper for fast, neural text-to-speech.
  • LLM Integration: (Planned) Connects to LLMs for conversational intelligence.

Prerequisites

1. Build Whisper with GPU Support

The application relies on shared libraries from the whisper.cpp project. The source code is included as a git submodule in third_party/whisper.cpp.

git submodule update --init --recursive
cd third_party/whisper.cpp
cmake -B build -DGGML_CUDA=1
cmake --build build -j $(nproc) --config Release

The application is configured to look for libraries in third_party/whisper.cpp/build when run via run.sh.

2. Download Whisper Model

Download a Whisper component model (e.g., base.en) to models/ggml-base.en.bin.

mkdir -p models
wget -O models/ggml-base.en.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin

3. Install Piper TTS

Download and install the Piper binary and a voice model.

  1. Download Piper binary release from Piper GitHub Releases.
  2. Extract it to a location (e.g., /opt/piper or locally).
  3. Download a voice model (ONNX + JSON config) from Piper Voices.
    • Example: en_US-lessac-medium

Update config.json with the paths to the Piper binary and model.

Configuration

Copy config.json.example to config.json and update the values:

{
  "sip_port": 5060,
  "rtp_start_port": 40000,
  "rtp_end_port": 50000,
  "whisper_model_path": "models/ggml-base.en.bin",
  "piper_binary_path": "/path/to/piper/piper",
  "piper_model_path": "/path/to/piper/models/en_US-lessac-medium.onnx",
  "http_port": 3000
}

Build and Run

The easiest way to run the application with all dependencies correctly linked is to use the run.sh script:

./run.sh

This script sets up CGO_LDFLAGS and LD_LIBRARY_PATH to point to the whisper.cpp build directory and the CUDA libraries. All arguments passed to run.sh are forwarded to the agent.

Example for verbose logging:

./run.sh -verbose

Research

The vendor-repos directory contains research checkpoints and references. It is not used for the build process.

API Usage

Text-to-Speech (TTS)

You can trigger the TTS engine to speak a phrase using the /api/speak endpoint:

curl -X POST http://localhost:3000/api/speak \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, world!"}'

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors