A Fully Offline, Local-AI Powered OS Extension & Personal Assistant
Jarvis-Style. Multimodal. Multi-Language. Low-Resource. Zero APIs.
VortexAI is a local AI layer that sits on top of your operating system and acts as a:
- Personal assistant
- Automation engine
- Voice-controlled AI agent
- File/indexing system
- Photo organizer
- Document processor
- Email/message manager
- Generative tools provider
It runs 100% offline, ships with all AI models inside the .exe, and supports English, Hindi, and French voice interaction, including mixed-language input (Hinglish, Frenglish, Code-Switching).
You own your data. No cloud. No APIs. No external installs.
AstraOS comes with:
- β Local LLM (Llama 3 / Mistral / Phi / Gemma β GGUF)
- β Local Vision (LLaVA / SigLIP)
- β Local Embeddings (BGE / LaBSE / CLIP)
- β Local Image Generation (Stable Diffusion Turbo)
- β Local Speech Recognition (Whisper)
- β Local Speech Synthesis (VITS / Piper)
- β Local Vector Search Database (FAISS)
- β Automation Engine (OS-level control)
- β Web Scraper (Rust-based, safe-mode)
- β Fully Configurable Settings UI (Tauri)
| Layer | Technology |
|---|---|
| Core Runtime | Rust |
| UI | Tauri + React/Svelte |
| Local LLM Engine | llama.cpp (statically linked) |
| STT | Whisper.cpp |
| TTS | Piper / VITS Local |
| Image Generation | diffusion.cpp |
| Vision / OCR | LLaVA.cpp / Tesseract |
| Vector Database | FAISS (local) |
| Metadata DB | SQLite |
| Filesystem Indexer | Rust async walkers |
| Task Automation | Windows APIs via winapi or Linux syscalls |
AstraOS/
β
βββ Core Runtime (Rust)
β βββ Event Loop
β βββ Intent Parser
β βββ Skill Engine
β βββ Memory Engine
β βββ Scheduler
β
βββ AI Layer
β βββ LLM (llama.cpp)
β βββ Vision (llava.cpp)
β βββ Diffusion (sd.cpp)
β βββ Embeddings (bge / clip / labse)
β βββ STT (whisper.cpp)
β βββ TTS (piper)
β
βββ Storage Layer
β βββ SQLite (metadata)
β βββ FAISS (vector index)
β βββ Cache (json)
β βββ File Registry
β
βββ Modules
β βββ Photo Organizer
β βββ File Search
β βββ Email Manager
β βββ Docs Parser
β βββ Automation Tools
β βββ Browser Agent
β βββ Settings & Profiles
β
βββ UI Layer (Tauri)
Used for:
- Intent recognition
- Semantic search
- Memory lookups
- File search
Model: bge-small-en-v1.5.gguf (60β120MB)
Used for:
- Photo clustering
- Similar photo search
- OCR + relevance ranking
- Deduplication
Model: clip-ViT-B-32.gguf
Used for:
- Speaker identity
- Voice command segmentation
- Voice memory
Model: Whisper encoder embeddings
Tables included:
/db/app.db
βββ user_settings
βββ voice_profiles
βββ automation_rules
βββ task_history
βββ scrape_cache
βββ email_index
βββ file_registry
βββ photo_metadata
Each with normalized schemas.
Folder: /vector/
Contains:
| Index | Purpose | Embedding Type |
|---|---|---|
| memory.index | Long-term AI memory | text |
| files.index | Document search | text |
| photos.index | Image similarity | image |
| speech.index | Speaker embeddings | audio |
| skills.index | Intent β skill mapping | text |
All indexes load at boot in streaming mode.
Folder: /cache/
- preprocessed OCR
- STT partial segments
- temp embeddings
- web-scraped DOM snapshots
- active conversation state
English + Hindi + French
Simultaneously. No switching.
- VAD (Voice Activity Detection)
- Whisper.cpp (medium or small)
- Language auto-detect
- Code-switch detection
- Sentence reconstruction
- Punctuation
- Intent classification
Model: Piper FastVITS Multilingual
Voices included:
- English (US/Neutral)
- Hindi (Delhi/Neutral)
- French (Paris/Neutral)
Speed: real-time or faster than real-time
Features:
- auto-scan entire system
- EXIF extraction
- people clustering
- location-based grouping
- duplicate removal
- object tags via vision model
- timeline view
- semantic search: "Show photos where I'm wearing a red hoodie with friends at night"
Uses:
- CLIP embeddings
- FAISS photos.index
- SQLite photo metadata
Supports:
- DOCX
- TXT
- PPTX
- Markdown
- Images
- Audio transcription
- Code files
Extracts:
- text content
- embeddings
- key metadata
- summaries
- timeline clusters
You can say things like:
- "Turn off my PC at 11."
- "When a new email arrives from professor, notify me."
- "Download all PDFs from this site."
- "Sort all of my desktop files."
Backend uses:
- OS APIs
- Node bindings inside Tauri
- Rust automation drivers
- A plugin-based skill system
Model:
sd-turbo.gguf(fast)sdxl-lightning.gguf(optional)
Templates stored in /templates/.
For React, Python, JS, etc.
Exposed Options:
- choose LLM model
- choose voice model
- GPU/CPU toggle
- resource/priority mode
- background permissions
- task scheduling
- privacy controls
- memory wipe
- vector reindex
Bundler: Tauri β NSIS β final .exe
Included in build:
- Rust runtime
- Tauri frontend
- AI engines (llama.cpp, whisper.cpp, sd.cpp)
- All GGUF models
- SQLite DB
- FAISS indexes
- Voice models
- Resource folder
Single exe output size: 1.8GB β 3.5GB depending on model choices.
AstraOS/
β
βββ app.exe
βββ README.md
βββ models/
β βββ llm/
β β βββ llama-3-8b.gguf
β βββ vision/
β β βββ llava-1.6.gguf
β βββ stt/
β β βββ whisper-medium.gguf
β βββ tts/
β β βββ piper-multilingual.onnx
β βββ embeddings/
β β βββ bge-small.gguf
β β βββ clip-ViT-B-32.gguf
β βββ sd/
β βββ sd-turbo.gguf
β
βββ db/
β βββ app.db
βββ vector/
β βββ memory.index
β βββ files.index
β βββ photos.index
β βββ speech.index
β
βββ cache/
βββ logs/
βββ plugins/
βββ templates/
- No internet calls (unless user enables web scraping)
- All data stored locally
- User-controlled memory wipe
- Password-protected profile
- Hardware-bound encryption option
- lazy model loading
- tensor caching
- quantized GGUF
- streaming inference
- async Rust runtime
- CPU/GPU configurable load
- auto-sleep mode
- mobile companion app
- face recognition & tagging
- full browser automation
- plugin marketplace
- smart scheduler
- multimodal memory graphs
- multi-agent architecture