Skip to content

RaghavSethi006/VortexAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ VortexAI - OS Extender

A Fully Offline, Local-AI Powered OS Extension & Personal Assistant

Jarvis-Style. Multimodal. Multi-Language. Low-Resource. Zero APIs.

🧠 Overview

VortexAI is a local AI layer that sits on top of your operating system and acts as a:

  • Personal assistant
  • Automation engine
  • Voice-controlled AI agent
  • File/indexing system
  • Photo organizer
  • Document processor
  • Email/message manager
  • Generative tools provider

It runs 100% offline, ships with all AI models inside the .exe, and supports English, Hindi, and French voice interaction, including mixed-language input (Hinglish, Frenglish, Code-Switching).

You own your data. No cloud. No APIs. No external installs.

🌐 Core Capabilities

AstraOS comes with:

  • βœ” Local LLM (Llama 3 / Mistral / Phi / Gemma – GGUF)
  • βœ” Local Vision (LLaVA / SigLIP)
  • βœ” Local Embeddings (BGE / LaBSE / CLIP)
  • βœ” Local Image Generation (Stable Diffusion Turbo)
  • βœ” Local Speech Recognition (Whisper)
  • βœ” Local Speech Synthesis (VITS / Piper)
  • βœ” Local Vector Search Database (FAISS)
  • βœ” Automation Engine (OS-level control)
  • βœ” Web Scraper (Rust-based, safe-mode)
  • βœ” Fully Configurable Settings UI (Tauri)

πŸ—‚ Tech Stack

Layer Technology
Core Runtime Rust
UI Tauri + React/Svelte
Local LLM Engine llama.cpp (statically linked)
STT Whisper.cpp
TTS Piper / VITS Local
Image Generation diffusion.cpp
Vision / OCR LLaVA.cpp / Tesseract
Vector Database FAISS (local)
Metadata DB SQLite
Filesystem Indexer Rust async walkers
Task Automation Windows APIs via winapi or Linux syscalls

πŸ— System Architecture

AstraOS/
β”‚
β”œβ”€β”€ Core Runtime (Rust)
β”‚   β”œβ”€β”€ Event Loop
β”‚   β”œβ”€β”€ Intent Parser
β”‚   β”œβ”€β”€ Skill Engine
β”‚   β”œβ”€β”€ Memory Engine
β”‚   └── Scheduler
β”‚
β”œβ”€β”€ AI Layer
β”‚   β”œβ”€β”€ LLM (llama.cpp)
β”‚   β”œβ”€β”€ Vision (llava.cpp)
β”‚   β”œβ”€β”€ Diffusion (sd.cpp)
β”‚   β”œβ”€β”€ Embeddings (bge / clip / labse)
β”‚   β”œβ”€β”€ STT (whisper.cpp)
β”‚   └── TTS (piper)
β”‚
β”œβ”€β”€ Storage Layer
β”‚   β”œβ”€β”€ SQLite (metadata)
β”‚   β”œβ”€β”€ FAISS (vector index)
β”‚   β”œβ”€β”€ Cache (json)
β”‚   └── File Registry
β”‚
β”œβ”€β”€ Modules
β”‚   β”œβ”€β”€ Photo Organizer
β”‚   β”œβ”€β”€ File Search
β”‚   β”œβ”€β”€ Email Manager
β”‚   β”œβ”€β”€ Docs Parser
β”‚   β”œβ”€β”€ Automation Tools
β”‚   β”œβ”€β”€ Browser Agent
β”‚   └── Settings & Profiles
β”‚
└── UI Layer (Tauri)

πŸ” Embedding Architecture (Text + Image + Audio)

Text Embeddings

Used for:

  • Intent recognition
  • Semantic search
  • Memory lookups
  • File search

Model: bge-small-en-v1.5.gguf (60–120MB)

Image Embeddings

Used for:

  • Photo clustering
  • Similar photo search
  • OCR + relevance ranking
  • Deduplication

Model: clip-ViT-B-32.gguf

Audio Embeddings

Used for:

  • Speaker identity
  • Voice command segmentation
  • Voice memory

Model: Whisper encoder embeddings

πŸ—ƒ Database Architecture

1. SQLite (Metadata Layer)

Tables included:

/db/app.db
β”œβ”€β”€ user_settings
β”œβ”€β”€ voice_profiles
β”œβ”€β”€ automation_rules
β”œβ”€β”€ task_history
β”œβ”€β”€ scrape_cache
β”œβ”€β”€ email_index
β”œβ”€β”€ file_registry
└── photo_metadata

Each with normalized schemas.

2. FAISS Vector Index (Semantic Layer)

Folder: /vector/

Contains:

Index Purpose Embedding Type
memory.index Long-term AI memory text
files.index Document search text
photos.index Image similarity image
speech.index Speaker embeddings audio
skills.index Intent β†’ skill mapping text

All indexes load at boot in streaming mode.

3. Cache Layer

Folder: /cache/

  • preprocessed OCR
  • STT partial segments
  • temp embeddings
  • web-scraped DOM snapshots
  • active conversation state

πŸ—£ Voice Engine

βœ” Multilingual Mixed Input

English + Hindi + French

Simultaneously. No switching.

STT Pipeline

  1. VAD (Voice Activity Detection)
  2. Whisper.cpp (medium or small)
  3. Language auto-detect
  4. Code-switch detection
  5. Sentence reconstruction
  6. Punctuation
  7. Intent classification

TTS Pipeline

Model: Piper FastVITS Multilingual

Voices included:

  • English (US/Neutral)
  • Hindi (Delhi/Neutral)
  • French (Paris/Neutral)

Speed: real-time or faster than real-time

πŸ–Ό Photo Organizer Module

Features:

  • auto-scan entire system
  • EXIF extraction
  • people clustering
  • location-based grouping
  • duplicate removal
  • object tags via vision model
  • timeline view
  • semantic search: "Show photos where I'm wearing a red hoodie with friends at night"

Uses:

  • CLIP embeddings
  • FAISS photos.index
  • SQLite photo metadata

πŸ“‚ File & Document Manager

Supports:

  • PDF
  • DOCX
  • TXT
  • PPTX
  • Markdown
  • Images
  • Audio transcription
  • Code files

Extracts:

  • text content
  • embeddings
  • key metadata
  • summaries
  • timeline clusters

πŸ€– Automation Engine

You can say things like:

  • "Turn off my PC at 11."
  • "When a new email arrives from professor, notify me."
  • "Download all PDFs from this site."
  • "Sort all of my desktop files."

Backend uses:

  • OS APIs
  • Node bindings inside Tauri
  • Rust automation drivers
  • A plugin-based skill system

🎨 Generative Tools

1. Local Image Generation

Model:

  • sd-turbo.gguf (fast)
  • sdxl-lightning.gguf (optional)

2. Local PPT/Document Generation

Templates stored in /templates/.

3. Local Code Templates

For React, Python, JS, etc.

βš™οΈ Settings & Profiles

Exposed Options:

  • choose LLM model
  • choose voice model
  • GPU/CPU toggle
  • resource/priority mode
  • background permissions
  • task scheduling
  • privacy controls
  • memory wipe
  • vector reindex

πŸ“¦ Packaging Into One EXE

Bundler: Tauri β†’ NSIS β†’ final .exe

Included in build:

  • Rust runtime
  • Tauri frontend
  • AI engines (llama.cpp, whisper.cpp, sd.cpp)
  • All GGUF models
  • SQLite DB
  • FAISS indexes
  • Voice models
  • Resource folder

Single exe output size: 1.8GB – 3.5GB depending on model choices.

πŸ“ Final Folder Structure

AstraOS/
β”‚
β”œβ”€β”€ app.exe
β”œβ”€β”€ README.md
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ llm/
β”‚   β”‚   └── llama-3-8b.gguf
β”‚   β”œβ”€β”€ vision/
β”‚   β”‚   └── llava-1.6.gguf
β”‚   β”œβ”€β”€ stt/
β”‚   β”‚   └── whisper-medium.gguf
β”‚   β”œβ”€β”€ tts/
β”‚   β”‚   └── piper-multilingual.onnx
β”‚   β”œβ”€β”€ embeddings/
β”‚   β”‚   β”œβ”€β”€ bge-small.gguf
β”‚   β”‚   └── clip-ViT-B-32.gguf
β”‚   └── sd/
β”‚       └── sd-turbo.gguf
β”‚
β”œβ”€β”€ db/
β”‚   └── app.db
β”œβ”€β”€ vector/
β”‚   β”œβ”€β”€ memory.index
β”‚   β”œβ”€β”€ files.index
β”‚   β”œβ”€β”€ photos.index
β”‚   └── speech.index
β”‚
β”œβ”€β”€ cache/
β”œβ”€β”€ logs/
β”œβ”€β”€ plugins/
└── templates/

πŸ” Privacy & Security

  • No internet calls (unless user enables web scraping)
  • All data stored locally
  • User-controlled memory wipe
  • Password-protected profile
  • Hardware-bound encryption option

🏎 Performance Optimizations

  • lazy model loading
  • tensor caching
  • quantized GGUF
  • streaming inference
  • async Rust runtime
  • CPU/GPU configurable load
  • auto-sleep mode

πŸ“œ Roadmap

  • mobile companion app
  • face recognition & tagging
  • full browser automation
  • plugin marketplace
  • smart scheduler
  • multimodal memory graphs
  • multi-agent architecture

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages