Skip to content

pixchy-commits/speechrecog

Repository files navigation

Offline Voice Assistant

A fully offline voice assistant built with Python that combines speech-to-text (Vosk), large language model (Gemma3), and text-to-speech (pyttsx3) capabilities.

Features

  • 🎤 Offline Speech Recognition - Uses Vosk for local speech-to-text processing
  • 🧠 AI-Powered Responses - Gemma3 model for intent recognition and natural conversation
  • 🔊 Text-to-Speech - pyttsx3 for voice responses
  • ⚙️ System Commands - Execute system-level tasks through voice commands
  • 🔧 Configurable - Easy configuration through JSON files
  • 🚫 No Internet Required - Completely offline operation

Components

Speech-to-Text (STT)

  • Engine: Vosk
  • Model: Small English model (~40MB)
  • Features: Real-time transcription, offline processing

Large Language Model (LLM)

  • Engine: Ollama
  • Model: Gemma3 (local installation via Ollama)
  • Features: Intent recognition, conversational responses, command extraction
  • API: Local REST API (http://localhost:11434)

Text-to-Speech (TTS)

  • Engine: pyttsx3
  • Features: Multiple voice options, adjustable rate and volume

Command Execution

  • System Commands: Time, date, file management
  • Applications: Notepad, calculator, browser
  • Volume Control: System volume adjustment
  • Web Search: Google and YouTube search

Installation

Prerequisites

  • Python 3.8 or higher
  • Ollama installed and running
  • Gemma3 model installed via Ollama (ollama pull gemma3)
  • Microphone and speakers
  • Windows/macOS/Linux

Setup

  1. Clone or download the project

    cd speechrecog
  2. Install dependencies (already done if you followed the setup)

    pip install -r requirements.txt
  3. Start Ollama and ensure Gemma3 is available

    ollama serve
    ollama pull gemma3  # if not already installed
  4. Verify installation

    python run_tests.py

Usage

Quick Start

python start_assistant.py

Or use the test interface:

python run_tests.py

Direct Launch

python src/main.py

Voice Commands

The assistant recognizes various types of commands:

Time & Date

  • "What time is it?"
  • "What's the date today?"

Applications

  • "Open notepad"
  • "Open calculator"
  • "Open browser"

File Management

  • "Open files" / "Open file manager"
  • "Open documents"
  • "Open downloads"

Volume Control

  • "Volume up" / "Increase volume"
  • "Volume down" / "Decrease volume"
  • "Mute" / "Silence"

Web Search

  • "Search for [query]"
  • "YouTube [query]"
  • "Look up [query]"

Conversational

  • Ask questions and have natural conversations
  • The assistant will provide helpful responses

Configuration

The system uses a configuration file located at config/voice_assistant_config.json. Key settings:

{
  "stt": {
    "model_path": "models/vosk-model-en",
    "sample_rate": 16000
  },
  "llm": {
    "model_name": "gemma3",
    "api_url": "http://localhost:11434",
    "temperature": 0.7
  },
  "tts": {
    "rate": 200,
    "volume": 0.8
  }
}

Project Structure

speechrecog/
├── src/
│   ├── stt/vosk_stt.py          # Speech-to-text implementation
│   ├── llm/gemma3_llm.py        # LLM integration
│   ├── tts/pyttsx3_tts.py       # Text-to-speech
│   ├── commands/command_executor.py  # Command execution
│   └── main.py                  # Main application
├── models/
│   └── vosk-model-en/           # Vosk language model
├── config/
│   ├── config.py                # Configuration management
│   └── voice_assistant_config.json  # Settings
├── requirements.txt             # Python dependencies
├── test_assistant.py           # Test suite
└── README.md                   # This file

Testing

Component Testing

python test_assistant.py
# Choose option 1 to test all components

Interactive Text Mode

python test_assistant.py
# Choose option 2 for text-based interaction

Individual Components

You can test components individually:

from src.stt.vosk_stt import VoskSTT
from src.llm.gemma3_llm import Gemma3LLM
from src.tts.pyttsx3_tts import Pyttsx3TTS
from src.commands.command_executor import CommandExecutor

Troubleshooting

Common Issues

  1. Microphone not working

    • Check system microphone permissions
    • Verify microphone is not used by other applications
  2. Ollama connection errors

    • Make sure Ollama is running: ollama serve
    • Check if Gemma3 model is installed: ollama list
    • Install if missing: ollama pull gemma3
  3. Speech recognition not accurate

    • Speak clearly and at moderate pace
    • Reduce background noise
    • Check microphone positioning
  4. TTS not working

    • Verify speakers/headphones are connected
    • Check system audio settings
  5. Commands not executing

    • Some commands may be platform-specific
    • Check logs for detailed error messages

Logs

Check voice_assistant.log for detailed error information.

Customization

Adding Custom Commands

Edit src/commands/command_executor.py to add new commands:

def _my_custom_command(self):
    # Your implementation
    return "Command executed successfully"

# Register the command
self.register_command("custom", self._my_custom_command, "My custom command")

Changing Voice Settings

Modify TTS settings in the configuration file or programmatically:

tts.set_rate(150)  # Slower speech
tts.set_volume(0.9)  # Louder volume

LLM Customization

Adjust the LLM parameters for different behavior:

llm.set_parameters(
    temperature=0.5,  # More deterministic
    top_p=0.8,        # Different sampling
    max_length=1024   # Shorter responses
)

Performance Notes

  • First run: Fast startup since Gemma3 is already installed via Ollama
  • Memory usage: ~2-4GB RAM for Ollama + Gemma3
  • API latency: ~100-500ms for local Ollama API calls
  • Response time: 1-2 seconds for most commands

Requirements

See requirements.txt for complete list. Key dependencies:

  • vosk>=0.3.45
  • pyttsx3>=2.90
  • requests>=2.28.0
  • pyaudio>=0.2.11
  • numpy>=1.21.0

External dependencies:

  • Ollama (for LLM functionality)

License

This project is for educational and personal use. Please respect the licenses of individual components:

  • Vosk: Apache 2.0
  • Gemma3: Gemma Terms of Use
  • pyttsx3: MPL 2.0

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages