A fully offline voice assistant built with Python that combines speech-to-text (Vosk), large language model (Gemma3), and text-to-speech (pyttsx3) capabilities.
- 🎤 Offline Speech Recognition - Uses Vosk for local speech-to-text processing
- 🧠 AI-Powered Responses - Gemma3 model for intent recognition and natural conversation
- 🔊 Text-to-Speech - pyttsx3 for voice responses
- ⚙️ System Commands - Execute system-level tasks through voice commands
- 🔧 Configurable - Easy configuration through JSON files
- 🚫 No Internet Required - Completely offline operation
- Engine: Vosk
- Model: Small English model (~40MB)
- Features: Real-time transcription, offline processing
- Engine: Ollama
- Model: Gemma3 (local installation via Ollama)
- Features: Intent recognition, conversational responses, command extraction
- API: Local REST API (http://localhost:11434)
- Engine: pyttsx3
- Features: Multiple voice options, adjustable rate and volume
- System Commands: Time, date, file management
- Applications: Notepad, calculator, browser
- Volume Control: System volume adjustment
- Web Search: Google and YouTube search
- Python 3.8 or higher
- Ollama installed and running
- Gemma3 model installed via Ollama (
ollama pull gemma3) - Microphone and speakers
- Windows/macOS/Linux
-
Clone or download the project
cd speechrecog -
Install dependencies (already done if you followed the setup)
pip install -r requirements.txt
-
Start Ollama and ensure Gemma3 is available
ollama serve ollama pull gemma3 # if not already installed -
Verify installation
python run_tests.py
python start_assistant.pyOr use the test interface:
python run_tests.pypython src/main.pyThe assistant recognizes various types of commands:
- "What time is it?"
- "What's the date today?"
- "Open notepad"
- "Open calculator"
- "Open browser"
- "Open files" / "Open file manager"
- "Open documents"
- "Open downloads"
- "Volume up" / "Increase volume"
- "Volume down" / "Decrease volume"
- "Mute" / "Silence"
- "Search for [query]"
- "YouTube [query]"
- "Look up [query]"
- Ask questions and have natural conversations
- The assistant will provide helpful responses
The system uses a configuration file located at config/voice_assistant_config.json. Key settings:
{
"stt": {
"model_path": "models/vosk-model-en",
"sample_rate": 16000
},
"llm": {
"model_name": "gemma3",
"api_url": "http://localhost:11434",
"temperature": 0.7
},
"tts": {
"rate": 200,
"volume": 0.8
}
}speechrecog/
├── src/
│ ├── stt/vosk_stt.py # Speech-to-text implementation
│ ├── llm/gemma3_llm.py # LLM integration
│ ├── tts/pyttsx3_tts.py # Text-to-speech
│ ├── commands/command_executor.py # Command execution
│ └── main.py # Main application
├── models/
│ └── vosk-model-en/ # Vosk language model
├── config/
│ ├── config.py # Configuration management
│ └── voice_assistant_config.json # Settings
├── requirements.txt # Python dependencies
├── test_assistant.py # Test suite
└── README.md # This file
python test_assistant.py
# Choose option 1 to test all componentspython test_assistant.py
# Choose option 2 for text-based interactionYou can test components individually:
from src.stt.vosk_stt import VoskSTT
from src.llm.gemma3_llm import Gemma3LLM
from src.tts.pyttsx3_tts import Pyttsx3TTS
from src.commands.command_executor import CommandExecutor-
Microphone not working
- Check system microphone permissions
- Verify microphone is not used by other applications
-
Ollama connection errors
- Make sure Ollama is running:
ollama serve - Check if Gemma3 model is installed:
ollama list - Install if missing:
ollama pull gemma3
- Make sure Ollama is running:
-
Speech recognition not accurate
- Speak clearly and at moderate pace
- Reduce background noise
- Check microphone positioning
-
TTS not working
- Verify speakers/headphones are connected
- Check system audio settings
-
Commands not executing
- Some commands may be platform-specific
- Check logs for detailed error messages
Check voice_assistant.log for detailed error information.
Edit src/commands/command_executor.py to add new commands:
def _my_custom_command(self):
# Your implementation
return "Command executed successfully"
# Register the command
self.register_command("custom", self._my_custom_command, "My custom command")Modify TTS settings in the configuration file or programmatically:
tts.set_rate(150) # Slower speech
tts.set_volume(0.9) # Louder volumeAdjust the LLM parameters for different behavior:
llm.set_parameters(
temperature=0.5, # More deterministic
top_p=0.8, # Different sampling
max_length=1024 # Shorter responses
)- First run: Fast startup since Gemma3 is already installed via Ollama
- Memory usage: ~2-4GB RAM for Ollama + Gemma3
- API latency: ~100-500ms for local Ollama API calls
- Response time: 1-2 seconds for most commands
See requirements.txt for complete list. Key dependencies:
- vosk>=0.3.45
- pyttsx3>=2.90
- requests>=2.28.0
- pyaudio>=0.2.11
- numpy>=1.21.0
External dependencies:
- Ollama (for LLM functionality)
This project is for educational and personal use. Please respect the licenses of individual components:
- Vosk: Apache 2.0
- Gemma3: Gemma Terms of Use
- pyttsx3: MPL 2.0