Jarvish is a modular, voice-activated AI assistant integrated with local LLMs (Ollama) and high-quality Text-to-Speech (Kokoro). It features both a command-line interface and a modern web dashboard.
- 🗣️ Voice Interaction: Seamless Speech-to-Text and Text-to-Speech loop.
- 🧠 Local Intelligence: Powered by Ollama (Llama 3, Mistral, etc.).
- 👁️ Vision & Screen Reading:
- Analyze images.
- "Read my screen": Takes a screenshot of your active monitor and analyzes it.
- 🔊 Natural Voice: Uses Kokoro TTS for realistic speech synthesis.
- 💻 Dual Interface:
- CLI: Terminal-based lightweight interaction.
- Web UI: Streamlit-based dashboard with chat history and voice input (mobile compatible).
- Python 3.8+
- Ollama running locally (
http://localhost:11434) - Kokoro TTS API (
https://github.com/remsky/Kokoro-FastAPI) running locally (http://localhost:8880)
Recommended for maintaining clean dependencies (tested on Ubuntu/Lubuntu 24).
# 1. Create a new environment
conda create -n jarvish python=3.10
conda activate jarvish
# 2. Install Audio & System Dependencies
# Note: Lubuntu/Ubuntu might require these for PyAudio and Screenshot tools
conda install -c conda-forge portaudio pyaudio alsa-lib alsa-plugins -y
# conda run -n screen_app python debug_audio.py
sudo apt-get update
sudo apt-get install ffmpeg scrot
# 3. Install Python Packages
pip install -r requirements.txt* Ensure you have a MySQL server running (e.g., via XAMPP, Docker, or local install).
* Create a database (default name: `jarvish_db`) or let the setup script do it for you.
* Initialize the database tables:
```bash
python setup_db.py
```
* (Optional) Update `config.py` or set environment variables `DB_HOST`, `DB_USER`, `DB_PASSWORD` if your MySQL configuration differs from default.
- Configuration (Optional):
You can modify
config.pyto change models (e.g.,gemma3:latest), voices, or database credentials.
- Run
streamlit run app.pyon your desktop. - Note the Network URL (e.g.,
http://192.168.1.5:8501) displayed in the terminal. - Open this URL on your mobile browser.
- Use the sidebar to set Audio Output to "Desktop Speakers".
- Speak into your mobile device. Jarvish will execute the task on your desktop (e.g., read screen) and reply through your desktop speakers.
The classic terminal experience.
python3 main.py- Speaks out loud using system speakers.
- Listens via default microphone.
Modern browsers block microphone access on "insecure" origins (HTTP remote IP). To fix this:
-
Option A (Recommended): Use ngrok to tunnel your localhost to an HTTPS URL.
ngrok http 8501
Autoplay Note: Mobile browsers often require one user interaction (tap anywhere) before allowing auto-playing audio. If audio doesn't play automatically, try interacting with the page first.
-
Option B (Chrome Flags):
- Go to
chrome://flags/#unsafely-treat-insecure-origin-as-secureon your mobile browser. - Add your computer's IP (e.g.,
http://192.168.1.5:8501). - Enable and restart chrome.
- Go to
Edit config.py or set environment variables:
| Variable | Default | Description |
|---|---|---|
OLLAMA_HOST |
http://localhost:11434 |
Ollama API URL |
OLLAMA_MODEL |
llama3 |
Text Model |
IMAGE_MODEL |
gemma |
Vision Model |
TTS_ENDPOINT |
.../v1/audio/speech |
Kokoro TTS Endpoint |
WAKE_WORD |
jarvis |
Activation word (CLI) |
core.py: Central logic for LLM/TTS/Audio orchestration.app.py: Streamlit Web Application.main.py: CLI Entry point.ollama_client.py: Ollama API wrapper.tts_client.py: Kokoro TTS wrapper.audio_manager.py: Audio I/O utilities.utils.py: System utilities (Screen capture).