🤖 Jarvish Assistant

Jarvish is a modular, voice-activated AI assistant integrated with local LLMs (Ollama) and high-quality Text-to-Speech (Kokoro). It features both a command-line interface and a modern web dashboard.

✨ Features

🗣️ Voice Interaction: Seamless Speech-to-Text and Text-to-Speech loop.
🧠 Local Intelligence: Powered by Ollama (Llama 3, Mistral, etc.).
👁️ Vision & Screen Reading:
- Analyze images.
- "Read my screen": Takes a screenshot of your active monitor and analyzes it.
🔊 Natural Voice: Uses Kokoro TTS for realistic speech synthesis.
💻 Dual Interface:
- CLI: Terminal-based lightweight interaction.
- Web UI: Streamlit-based dashboard with chat history and voice input (mobile compatible).

🚀 Installation

prerequisites

Python 3.8+
Ollama running locally (http://localhost:11434)
Kokoro TTS API (https://github.com/remsky/Kokoro-FastAPI) running locally (http://localhost:8880)

1. Environment Setup (Miniconda)

Recommended for maintaining clean dependencies (tested on Ubuntu/Lubuntu 24).

# 1. Create a new environment
conda create -n jarvish python=3.10
conda activate jarvish

# 2. Install Audio & System Dependencies
# Note: Lubuntu/Ubuntu might require these for PyAudio and Screenshot tools
conda install -c conda-forge portaudio pyaudio alsa-lib alsa-plugins -y
# conda run -n screen_app python debug_audio.py
sudo apt-get update
sudo apt-get install ffmpeg scrot

# 3. Install Python Packages
pip install -r requirements.txt

2. Database Setup (MySQL)

*   Ensure you have a MySQL server running (e.g., via XAMPP, Docker, or local install).
*   Create a database (default name: `jarvish_db`) or let the setup script do it for you.
*   Initialize the database tables:
    ```bash
    python setup_db.py
    ```
*   (Optional) Update `config.py` or set environment variables `DB_HOST`, `DB_USER`, `DB_PASSWORD` if your MySQL configuration differs from default.

Configuration (Optional): You can modify config.py to change models (e.g., gemma3:latest), voices, or database credentials.

🎮 Usage

Option A: Web UI (Mobile Input / Desktop Output)

Run streamlit run app.py on your desktop.
Note the Network URL (e.g., http://192.168.1.5:8501) displayed in the terminal.
Open this URL on your mobile browser.
Use the sidebar to set Audio Output to "Desktop Speakers".
Speak into your mobile device. Jarvish will execute the task on your desktop (e.g., read screen) and reply through your desktop speakers.

Option B: Command Line

The classic terminal experience.

python3 main.py

Speaks out loud using system speakers.
Listens via default microphone.

Troubleshooting Mobile Audio

Modern browsers block microphone access on "insecure" origins (HTTP remote IP). To fix this:

Option A (Recommended): Use ngrok to tunnel your localhost to an HTTPS URL.
```
ngrok http 8501
```
Autoplay Note: Mobile browsers often require one user interaction (tap anywhere) before allowing auto-playing audio. If audio doesn't play automatically, try interacting with the page first.
Option B (Chrome Flags):
- Go to chrome://flags/#unsafely-treat-insecure-origin-as-secure on your mobile browser.
- Add your computer's IP (e.g., http://192.168.1.5:8501).
- Enable and restart chrome.

⚙️ Configuration

Edit config.py or set environment variables:

Variable	Default	Description
`OLLAMA_HOST`	`http://localhost:11434`	Ollama API URL
`OLLAMA_MODEL`	`llama3`	Text Model
`IMAGE_MODEL`	`gemma`	Vision Model
`TTS_ENDPOINT`	`.../v1/audio/speech`	Kokoro TTS Endpoint
`WAKE_WORD`	`jarvis`	Activation word (CLI)

📁 Project Structure

core.py: Central logic for LLM/TTS/Audio orchestration.
app.py: Streamlit Web Application.
main.py: CLI Entry point.
ollama_client.py: Ollama API wrapper.
tts_client.py: Kokoro TTS wrapper.
audio_manager.py: Audio I/O utilities.
utils.py: System utilities (Screen capture).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Jarvish Assistant

✨ Features

🚀 Installation

prerequisites

1. Environment Setup (Miniconda)

2. Database Setup (MySQL)

🎮 Usage

Option A: Web UI (Mobile Input / Desktop Output)

Option B: Command Line

Troubleshooting Mobile Audio

⚙️ Configuration

📁 Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
audio_manager.py		audio_manager.py
config.py		config.py
core.py		core.py
db_manager.py		db_manager.py
list_devices.py		list_devices.py
main.py		main.py
ollama_client.py		ollama_client.py
openapi-kokorotts.json		openapi-kokorotts.json
requirements.txt		requirements.txt
setup_db.py		setup_db.py
tts_client.py		tts_client.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

🤖 Jarvish Assistant

✨ Features

🚀 Installation

prerequisites

1. Environment Setup (Miniconda)

2. Database Setup (MySQL)

🎮 Usage

Option A: Web UI (Mobile Input / Desktop Output)

Option B: Command Line

Troubleshooting Mobile Audio

⚙️ Configuration

📁 Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages