Self-hosted WebUI/CLI that brings state-of-the-art language, vision, and speech model to local hardware.
Now updated with web search and microphone support.
- 100% Local & Private: Runs entirely on your GPU, No API keys or subscriptions
- VRAM Hot-Swapping: A built-in
ModelManagerhot-swaps models in and out of GPU memory - Web Search: Built-in web search using DuckDuckGo that bypasses model's training cutoff date.
- Text-to-Speech: Realistic voice responses powered by the
Kokoro-82Mengine - Conversational AI: Chat intelligently using SOTA
Phi-4-mini-instructmodel - Image Captioning: Upload images and have them analyzed using
Qwen3-VL-2B-Instruct - Dual Interfaces: Sleek WebUI using Gradio and lightweight CLI with Python
- Nvidia GPU (At least 6GB+ VRAM recommended)
- Python 3.11+ installed on your system
- PyTorch with CUDA 12.1
python -m venv venv
venv\Scripts\activate # On Windows
source venv/bin/activate # On Mac/Linux
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
Note: you need to run the server first before cli or webui
uvicorn server:app
python app.py
python cli.py
/search [query] to make a web search.
/image path/to/image.jpg to analyze an image.
If you find this project valuable, consider supporting my work:
Ville Pakarinen (@vpakarinen2)

