Offline Voice Assistant

A fully offline voice assistant built with Python that combines speech-to-text (Vosk), large language model (Gemma3), and text-to-speech (pyttsx3) capabilities.

Features

🎤 Offline Speech Recognition - Uses Vosk for local speech-to-text processing
🧠 AI-Powered Responses - Gemma3 model for intent recognition and natural conversation
🔊 Text-to-Speech - pyttsx3 for voice responses
⚙️ System Commands - Execute system-level tasks through voice commands
🔧 Configurable - Easy configuration through JSON files
🚫 No Internet Required - Completely offline operation

Components

Speech-to-Text (STT)

Engine: Vosk
Model: Small English model (~40MB)
Features: Real-time transcription, offline processing

Large Language Model (LLM)

Engine: Ollama
Model: Gemma3 (local installation via Ollama)
Features: Intent recognition, conversational responses, command extraction
API: Local REST API (http://localhost:11434)

Text-to-Speech (TTS)

Engine: pyttsx3
Features: Multiple voice options, adjustable rate and volume

Command Execution

System Commands: Time, date, file management
Applications: Notepad, calculator, browser
Volume Control: System volume adjustment
Web Search: Google and YouTube search

Installation

Prerequisites

Python 3.8 or higher
Ollama installed and running
Gemma3 model installed via Ollama (ollama pull gemma3)
Microphone and speakers
Windows/macOS/Linux

Setup

Clone or download the project
```
cd speechrecog
```
Install dependencies (already done if you followed the setup)
```
pip install -r requirements.txt
```

Start Ollama and ensure Gemma3 is available

ollama serve
ollama pull gemma3  # if not already installed

Verify installation
```
python run_tests.py
```

Usage

Quick Start

python start_assistant.py

Or use the test interface:

python run_tests.py

Direct Launch

python src/main.py

Voice Commands

The assistant recognizes various types of commands:

Time & Date

"What time is it?"
"What's the date today?"

Applications

"Open notepad"
"Open calculator"
"Open browser"

File Management

"Open files" / "Open file manager"
"Open documents"
"Open downloads"

Volume Control

"Volume up" / "Increase volume"
"Volume down" / "Decrease volume"
"Mute" / "Silence"

Web Search

"Search for [query]"
"YouTube [query]"
"Look up [query]"

Conversational

Ask questions and have natural conversations
The assistant will provide helpful responses

Configuration

The system uses a configuration file located at config/voice_assistant_config.json. Key settings:

{
  "stt": {
    "model_path": "models/vosk-model-en",
    "sample_rate": 16000
  },
  "llm": {
    "model_name": "gemma3",
    "api_url": "http://localhost:11434",
    "temperature": 0.7
  },
  "tts": {
    "rate": 200,
    "volume": 0.8
  }
}

Project Structure

speechrecog/
├── src/
│   ├── stt/vosk_stt.py          # Speech-to-text implementation
│   ├── llm/gemma3_llm.py        # LLM integration
│   ├── tts/pyttsx3_tts.py       # Text-to-speech
│   ├── commands/command_executor.py  # Command execution
│   └── main.py                  # Main application
├── models/
│   └── vosk-model-en/           # Vosk language model
├── config/
│   ├── config.py                # Configuration management
│   └── voice_assistant_config.json  # Settings
├── requirements.txt             # Python dependencies
├── test_assistant.py           # Test suite
└── README.md                   # This file

Testing

Component Testing

python test_assistant.py
# Choose option 1 to test all components

Interactive Text Mode

python test_assistant.py
# Choose option 2 for text-based interaction

Individual Components

You can test components individually:

from src.stt.vosk_stt import VoskSTT
from src.llm.gemma3_llm import Gemma3LLM
from src.tts.pyttsx3_tts import Pyttsx3TTS
from src.commands.command_executor import CommandExecutor

Troubleshooting

Common Issues

Microphone not working
- Check system microphone permissions
- Verify microphone is not used by other applications
Ollama connection errors
- Make sure Ollama is running: ollama serve
- Check if Gemma3 model is installed: ollama list
- Install if missing: ollama pull gemma3
Speech recognition not accurate
- Speak clearly and at moderate pace
- Reduce background noise
- Check microphone positioning
TTS not working
- Verify speakers/headphones are connected
- Check system audio settings
Commands not executing
- Some commands may be platform-specific
- Check logs for detailed error messages

Logs

Check voice_assistant.log for detailed error information.

Customization

Adding Custom Commands

Edit src/commands/command_executor.py to add new commands:

def _my_custom_command(self):
    # Your implementation
    return "Command executed successfully"

# Register the command
self.register_command("custom", self._my_custom_command, "My custom command")

Changing Voice Settings

Modify TTS settings in the configuration file or programmatically:

tts.set_rate(150)  # Slower speech
tts.set_volume(0.9)  # Louder volume

LLM Customization

Adjust the LLM parameters for different behavior:

llm.set_parameters(
    temperature=0.5,  # More deterministic
    top_p=0.8,        # Different sampling
    max_length=1024   # Shorter responses
)

Performance Notes

First run: Fast startup since Gemma3 is already installed via Ollama
Memory usage: ~2-4GB RAM for Ollama + Gemma3
API latency: ~100-500ms for local Ollama API calls
Response time: 1-2 seconds for most commands

Requirements

See requirements.txt for complete list. Key dependencies:

vosk>=0.3.45
pyttsx3>=2.90
requests>=2.28.0
pyaudio>=0.2.11
numpy>=1.21.0

External dependencies:

Ollama (for LLM functionality)

License

This project is for educational and personal use. Please respect the licenses of individual components:

Vosk: Apache 2.0
Gemma3: Gemma Terms of Use
pyttsx3: MPL 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude		.claude
config		config
models		models
src		src
README.md		README.md
check_ollama.py		check_ollama.py
requirements.txt		requirements.txt
run_tests.py		run_tests.py
start_assistant.py		start_assistant.py
test_text_mode.py		test_text_mode.py
voice_assistant.log		voice_assistant.log

Folders and files

Latest commit

History

Repository files navigation

Offline Voice Assistant

Features

Components

Speech-to-Text (STT)

Large Language Model (LLM)

Text-to-Speech (TTS)

Command Execution

Installation

Prerequisites

Setup

Usage

Quick Start

Direct Launch

Voice Commands

Time & Date

Applications

File Management

Volume Control

Web Search

Conversational

Configuration

Project Structure

Testing

Component Testing

Interactive Text Mode

Individual Components

Troubleshooting

Common Issues

Logs

Customization

Adding Custom Commands

Changing Voice Settings

LLM Customization

Performance Notes

Requirements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages