Go Voice App

A voice agent application that integrates SIP/RTP with LLMs (Large Language Models), TTS (Text-to-Speech), and STT (Speech-to-Text).

Features

SIP/RTP Integration: Handles VoIP calls using sipgo and pion/sdp.
Speech-to-Text (STT): Uses Whisper for high-accuracy speech recognition.
Text-to-Speech (TTS): Uses Piper for fast, neural text-to-speech.
LLM Integration: (Planned) Connects to LLMs for conversational intelligence.

Prerequisites

1. Build Whisper with GPU Support

The application relies on shared libraries from the whisper.cpp project. The source code is included as a git submodule in third_party/whisper.cpp.

git submodule update --init --recursive
cd third_party/whisper.cpp
cmake -B build -DGGML_CUDA=1
cmake --build build -j $(nproc) --config Release

The application is configured to look for libraries in third_party/whisper.cpp/build when run via run.sh.

2. Download Whisper Model

Download a Whisper component model (e.g., base.en) to models/ggml-base.en.bin.

mkdir -p models
wget -O models/ggml-base.en.bin https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin

3. Install Piper TTS

Download and install the Piper binary and a voice model.

Download Piper binary release from Piper GitHub Releases.
Extract it to a location (e.g., /opt/piper or locally).
Download a voice model (ONNX + JSON config) from Piper Voices.
- Example: en_US-lessac-medium

Update config.json with the paths to the Piper binary and model.

Configuration

Copy config.json.example to config.json and update the values:

{
  "sip_port": 5060,
  "rtp_start_port": 40000,
  "rtp_end_port": 50000,
  "whisper_model_path": "models/ggml-base.en.bin",
  "piper_binary_path": "/path/to/piper/piper",
  "piper_model_path": "/path/to/piper/models/en_US-lessac-medium.onnx",
  "http_port": 3000
}

Build and Run

The easiest way to run the application with all dependencies correctly linked is to use the run.sh script:

./run.sh

This script sets up CGO_LDFLAGS and LD_LIBRARY_PATH to point to the whisper.cpp build directory and the CUDA libraries. All arguments passed to run.sh are forwarded to the agent.

Example for verbose logging:

./run.sh -verbose

Research

The vendor-repos directory contains research checkpoints and references. It is not used for the build process.

API Usage

Text-to-Speech (TTS)

You can trigger the TTS engine to speak a phrase using the /api/speak endpoint:

curl -X POST http://localhost:3000/api/speak \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, world!"}'

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
cmd		cmd
docs		docs
internal		internal
pkg		pkg
python		python
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
GEMINI.md		GEMINI.md
README.md		README.md
config.json.example		config.json.example
go.mod		go.mod
go.sum		go.sum
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Go Voice App

Features

Prerequisites

1. Build Whisper with GPU Support

2. Download Whisper Model

3. Install Piper TTS

Configuration

Build and Run

Research

API Usage

Text-to-Speech (TTS)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Go Voice App

Features

Prerequisites

1. Build Whisper with GPU Support

2. Download Whisper Model

3. Install Piper TTS

Configuration

Build and Run

Research

API Usage

Text-to-Speech (TTS)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages