Skip to content

Latest commit

 

History

History
502 lines (382 loc) · 19 KB

File metadata and controls

502 lines (382 loc) · 19 KB

NeonVoice-Core

Core repository for NeonAI
Local-first AI assistant with mode-based routing, tool calling, voice control, and a Flask web UI.

Python Flask Ollama Offline First

Important

NeonVoice-Core powers NeonAI with scored routing across tools, web lookup, local LLM flows, and voice assistant features.

Quick Snapshot

Area Highlights
Core AI scored routing, confidence gate, local-first pipeline
Modes casual, coding, movie, exam, voice assistant
Tools weather, notes, calculator, browser, music, web reader
Stack Flask, Ollama, Whisper, ChromaDB, GPT-SoVITS

Showcase

NeonAI primary interface

Tip

Add your next two screenshots later by replacing the placeholder URLs below.

Two-screenshot showcase block
<table>
  <tr>
    <td width="50%" align="center">
      <strong>Main Chat UI</strong>
      <br><br>
      <img src="PASTE_SCREENSHOT_1_URL_HERE" width="100%" alt="NeonVoice-Core screenshot 1" />
    </td>
    <td width="50%" align="center">
      <strong>Voice Assistant / Mode View</strong>
      <br><br>
      <img src="PASTE_SCREENSHOT_2_URL_HERE" width="100%" alt="NeonVoice-Core screenshot 2" />
    </td>
  </tr>
</table>

Overview

NeonVoice-Core powers NeonAI and routes requests through a small local AI system instead of sending everything straight to a single model.

  • direct tools for weather, notes, calculator, browser actions, system info, and music
  • scored routing between system commands, tools, web lookup, and local LLM generation
  • separate modes for chat, coding, movies, exam/RAG, and voice assistant behavior
  • local-first operation with Ollama, Whisper, and optional GPT-SoVITS

Features

Modes

  • casual: general chat with tool-first routing and optional web lookup
  • coding: coding-focused responses using a dedicated code model
  • movie: TMDB-powered movie cards, recommendations, and summaries
  • exam: PDF-only retrieval mode backed by ChromaDB
  • voice_assistant: speech input, command routing, and TTS output

Built-in tools

  • weather
  • calculator and conversions
  • system information
  • notes
  • browser and search control
  • webpage reader
  • music lookup and YouTube handoff

Quick Start

python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
Copy-Item .env.example .env
python server.py

Required local services:

  • Install Ollama
  • Pull llama3.2:3b
  • Pull qwen2.5-coder

Optional voice setup:

  • install GPT-SoVITS
  • set GPT_SOVITS_DIR or explicit GPT-SoVITS model paths in .env

Open http://localhost:5000

Repo Hygiene

This repo intentionally keeps local-only files out of GitHub:

  • .env and private tokens
  • user_data/ databases, uploads, notes, and runtime files
  • local embedding cache files under models/embeddings/
  • temp audio, test artifacts, and editor caches

If the local embedding cache is missing, NeonAI now falls back to the official sentence-transformers/all-MiniLM-L6-v2 model name so clones do not need your local cache committed to GitHub.

Testing

python -m pytest -q

Legacy Notes

Open the original project write-up
NeonAI V5



⚡ Local-First AI System with Voice Assistant & Tool Calling

Python Flask Ollama Whisper TTS Offline

Mode-Driven Intelligence • Voice Assistant • Tool Calling • Privacy Focused


NeonAI Screenshot


🧠 What Is NeonAI?

NeonAI V5 is a fully local AI system with mode-driven intelligence, tool calling, voice assistant, and a premium UI — running entirely on your machine.

⚠️ This is not a chatbot wrapper.
NeonAI is an AI system — with modes, rules, confidence gates, tool calling, memory, voice control, and decision pipelines. The LLM is a component, not the decision-maker.


✨ What Makes NeonAI Different

Principle Description
🧠 System > Model AI logic governs the LLM, not the other way around
🔒 Privacy First Everything runs locally — your data never leaves your machine
🎯 Mode-Driven Each mode has its own rules, memory, tools, and constraints
🧭 Scored Router Deterministic routing (system/tool/web/LLM) with confidence + guards
🧩 Clarification Layer N-best routing asks when top-2 intents are close
🧠 Context Memory Router Follow-ups like “pause” resolve correctly using recent context
🛠️ Tool Calling Real tools (weather, calculator, browser, notes, music) — instant, no LLM needed
🎤 Voice Control Full voice assistant with system commands, TTS, and tool access
🔐 Secure Session-isolated users, hashed passwords, safe math eval, auth-guarded endpoints
🧪 Experimental Built to explore controlled AI design, not to be a product

🎮 Modes

🤖 NEON AIGeneral chat with smart web search + local LLM hybrid. Calculator & weather tools built-in.
💻 NEON CODECopy-paste ready code generation. Auto-switches from casual on coding intent.
🎬 NEON MOVIESTrending carousel, movie details with genres/director/trailer/recommendations via TMDB.
📚 NEON STUDYPDF-based RAG pipeline. Internet blocked. If the answer isn't in the PDF → AI refuses.
🎤 VOICE ASSISTANTFull voice control — talk to Neon, use tools, control your PC. 20+ command types.

💡 Each mode has isolated chat history — switching modes keeps each mode's conversation separate.


🛠️ Tool Calling

NeonAI has built-in tools that respond instantly without waiting for the LLM. Tool routing uses a Semantic Router (SentenceTransformers) for natural-language intent matching + actionability gates to reduce false triggers.

Tool calls also return structured data for the UI:

{
  "type": "tool",
  "tool": "weather",
  "action": "get_weather",
  "args": { "city": "Delhi" }
}
Tool Trigger Examples Available In
🌤️ Weather "Weather in Delhi", "Temperature in New York" Chat + Voice
🧮 Calculator "Calculate 25 × 4 + 10", "Convert 100 km to miles" Chat + Voice
💻 System Info "Battery level", "RAM usage", "CPU status" Chat + Voice
📝 Notes "Save note: buy groceries", "Show my notes" Chat + Voice
🌐 Web Reader "Read this https://example.com" Chat + Voice
🎵 Music "Top 10 songs", "Play Drake", "Recommend some hip-hop" Chat + Voice
🔍 Browser "Search on YouTube", "Google machine learning" Chat + Voice
User: "Weather in Delhi"
  → Semantic Router detects intent → weather tool
  → Instant response: 🌤️ 28°C, Partly Cloudy
  → No LLM call needed (< 1 second)

🎤 Voice Assistant

Talk to Neon using Whisper (STT) + Llama 3.2 (brain) + GPT-SoVITS (TTS).

CategoryExamples
🖥️ Apps"Open Chrome", "Launch Spotify", "Open VS Code"
🌐 Web"Open YouTube", "Go to GitHub"
🔍 Search"Search Python tutorials", "Google the news"
▶️ YouTube"Play lofi music on YouTube"
🎵 Media"Pause", "Next song", "Stop music"
🔊 Volume"Volume up", "Set volume to 50", "Mute"
💡 Brightness"Increase brightness", "Set brightness to 70"
📶 Connectivity"Turn on Bluetooth", "WiFi off", "Airplane mode"
⚡ System"Shutdown", "Restart", "Lock screen", "Sleep"
🌤️ Tools"What's the weather?", "System info", "Save a note"

🏗️ Architecture

graph TD;
    Text_Query-->intent_score_router;
    Voice_Audio-->whisper_engine_STT;
    
    intent_score_router-->Is_it_a_System_or_Tool_or_Web;
    whisper_engine_STT-->Is_it_a_Command;
    
    Is_it_a_System_or_Tool_or_Web-- SYSTEM -->system_command_executor;
    Is_it_a_System_or_Tool_or_Web-- TOOL -->Execute_Local_Tool;
    Is_it_a_System_or_Tool_or_Web-- WEB -->search_adapter;
    Is_it_a_Command-- YES -->command_router_OS_Actions;
    
    Is_it_a_System_or_Tool_or_Web-- LLM -->waterfall;
    Is_it_a_Command-- NO -->waterfall;
    
    waterfall-->Need_Web_Search;
    Need_Web_Search-- YES -->search_adapter;
    Need_Web_Search-- NO -->Local_LLM_Llama3_Qwen;
    search_adapter-->Local_LLM_Llama3_Qwen;
    
    Local_LLM_Llama3_Qwen-->confidence_gate;
    confidence_gate-->Pass_Threshold;
    Pass_Threshold-- NO -->Block_Regenerate;
    Pass_Threshold-- YES -->Return_Text_TTS_GPT_SoVITS;
Loading

🎨 Premium UI Features

  • 🚀 Animated Splash Screen — Spinning ring, progress bar, "NEON AI" reveal on startup
  • 🎨 15+ Neon Themes + Light/Dark mode with physics-based liquid toggle
  • 💬 Rich Message Rendering — Bold, headers, numbered lists as glass cards, rating badges
  • 📊 Confidence Scoring Badges — AI self-evaluates (0-100%) and displays a confidence metric badge under every answer
  • 🎥 Voice Customization — Upload your own looping background video for the Voice UI panel
  • 🖼️ Wallpaper Upload (Image/Video) — Drag & drop + progress bar + remove button
  • 📦 Upload Limits — Background video up to 50MB, image up to 10MB
  • 🎵 Music Cards — Rich, clickable YouTube-linked gradient cards natively rendered in chat
  • 📋 Code Blocks — Syntax highlighted with copy-to-clipboard button
  • 🌐 Web Source Icons — Favicon pills show which websites sourced the answer
  • 🎬 Movie Detail Cards — Genre tags, director, runtime, trailer button, recommendation carousel
  • 🎙️ Draggable Voice Button — GSAP Draggable, saves position
  • 📱 Fully Responsive — Desktop + Mobile

📂 Project Structure

NeonAI/
│
├── server.py                  # Flask backend + API routing + /health endpoint
├── START_NEON.bat             # One-click launcher (Windows)
├── .env                       # Environment config (NEON_SECRET, NGROK_TOKEN)
│
├── brain/                     # Core AI logic
│   ├── waterfall.py           # Decision flow & smart routing
│   ├── intent_score_router.py # Deterministic scored routing + N-best clarification
│   ├── router_state.py        # Per-user clarification + context memory state
│   ├── confidence_gate.py     # Hallucination control (0-100%)
│   ├── gk_engine.py           # General knowledge evaluation
│   └── memory.py              # Session & preference memory
│
├── models/                    # LLM layer
│   ├── local_llm.py           # Llama 3.2 (chat) + Qwen 2.5 (coding) via Ollama
│   ├── hybrid_llm.py          # Web + LLM fusion
│   └── assistant_llm.py       # Llama 3.2 (voice) via Ollama
│
├── tools/                     # Tool Calling System (Semantic Router)
│   ├── tool_router.py         # SentenceTransformer intent detection
│   ├── weather.py             # Weather via Open-Meteo (free, no key)
│   ├── calculator.py          # Safe AST math + unit conversions
│   ├── system_info.py         # CPU/RAM/disk/battery/GPU
│   ├── notes.py               # Thread-safe CRUD notes (JSON)
│   ├── music.py               # YouTube Music search + curated lists
│   ├── web_reader.py          # Fetch & summarize URLs
│   └── browser_control.py     # Google/YouTube/URL opener
│   └── vision_offline.py      # Offline resume/image analysis via Ollama vision + PDF extraction
│
├── voice/                     # Voice Assistant
│   ├── whisper_engine.py      # Speech-to-text (Whisper)
│   ├── tts_engine.py          # Text-to-speech (GPT-SoVITS) — env-configurable
│   ├── command_router.py      # Semantic NLP → action routing (per-user state)
│   ├── llm_command_executor.py # System command execution (volume, apps, etc.)
│   ├── model_loader.py        # Voice model management — env-configurable
│   └── reference_loader.py    # TTS reference audio — env-configurable
│
├── exam/                      # NEON STUDY (PDF RAG)
│   ├── indexer.py             # PDF → ChromaDB vector indexing
│   └── retriever.py           # Strict PDF-only retrieval
│
├── web/                       # Web adapters
│   ├── search_adapter.py      # Tavily / DuckDuckGo
│   └── movie_adapter.py       # TMDB (genres, trailer, recs)
│
├── utils/                     # Utilities
│   ├── auth_db.py             # SQLite auth (hashed passwords, try/finally)
│   ├── movie_db.py            # Movie cache (SQLite, try/finally)
│   ├── network.py             # Internet policy & connectivity check
│   └── storage_paths.py       # Centralized path management
│
├── scripts/                   # Dev tools
│   ├── command_tester.py      # Test command routing
│   ├── edge_case_tester.py    # Test edge cases
│   ├── generate_flow.py       # Generate architecture diagram
│   └── movie_updater.py       # Batch movie cache updates
│   └── add_one_line_headers.py # Bulk-add one-line file purpose headers
│
├── tests/
│   └── test_routing.py        # Pytest test suite
│   └── test_false_triggers.py # Regression tests for routing false triggers
│
├── templates/
│   ├── index.html             # Main chat UI
│   └── login.html             # Authentication page
│
└── static/
    ├── app.js                 # Frontend logic (1500+ lines)
    ├── styles.css             # Premium styling (2500+ lines)
    └── wallpapers/            # Custom backgrounds

▶️ Quick Start

Requirements

Software:

  • Python 3.10+
  • Ollama installed and running
  • Models: ollama pull llama3.2:3b + ollama pull qwen2.5-coder
  • (Optional) GPT-SoVITS for voice TTS

Hardware:

  • CPU: Multi-core processor (Intel i5/Ryzen 5 or better)
  • RAM: Minimum 8GB (16GB recommended)
  • GPU (Optional): NVIDIA GPU with 6GB+ VRAM for Whisper & GPT-SoVITS acceleration
  • Storage: Minimum 10GB free (SSD preferred)

Install & Run

pip install -r requirements.txt
python server.py

Or double-click START_NEON.bat

Open: http://localhost:5000

Health Check

Visit http://localhost:5000/health to verify system status (Ollama, TTS, Internet).

Optional API Keys (in Settings ⚙️)

  • TMDB — Movie posters, details, recommendations
  • Tavily — Higher quality web search (free tier available)

Environment Variables (.env)

Variable Purpose Required
NEON_SECRET Flask session signing key ✅ (auto-generated default)
NGROK_TOKEN ngrok tunnel for remote access Optional
TTS_REF_AUDIO Custom TTS reference audio path Optional
GPT_SOVITS_GPT_MODEL GPT-SoVITS GPT model path Optional
GPT_SOVITS_SOVITS_MODEL GPT-SoVITS SoVITS model path Optional

🔐 Security

  • No eval() — Math uses safe AST-based evaluation
  • Hashed passwords — PBKDF2 via Werkzeug
  • Auth-guarded endpoints — All write/reset routes require login
  • Session rotation — Regenerated on login to prevent fixation
  • HTTPOnly cookies — Session cookies not accessible via JavaScript
  • CORS locked — Only localhost origins accepted
  • Per-user isolation — Separate history, notes, media, and pending commands

✅ Check Code (Sanity Checks)

Run these from the project root:

python -m compileall -q .
python -m pytest -q

🧪 Status

  • ✅ Multi-mode AI system with isolated history
  • ✅ Semantic Router tool calling (weather, calculator, notes, system, browser, music, web reader)
  • ✅ Deterministic scored routing (system/tool/web/LLM) + N-best clarification + context memory
  • ✅ Structured tool outputs (tool_data) for UI/voice
  • ✅ Voice assistant with 20+ command types and Smart Browser Control
  • ✅ Premium UI with splash screen, 15+ themes, animations, microinteractions
  • ✅ Confidence Gate scoring (0-100% evaluation metric)
  • ✅ Smart web search + local LLM hybrid
  • ✅ Movie mode with trailer, genres, recommendations
  • ✅ Code blocks syntax highlighted with copy-to-clipboard button
  • ✅ Rich markdown rendering (lists, headers, ratings)
  • ✅ Ollama lazy reconnect (auto-recovers if started late)
  • ✅ Thread-safe notes and SQLite connection management
  • ⚠️ Experimental — Architecture locked for iteration

🚀 Future Enhancements

  1. Vision (Realtime camera): Webcam/screenshot analysis
  2. Long-Term Vector Memory: Cross-session preference/knowledge memory
  3. Autonomous Agents: Chained multi-tool workflows (search → summarize → save to notes)

⚠️ Disclaimer

This is an experimental project built for learning, research, and AI system design exploration. Not a commercial product.


🧠 Built by Ansh

B.Tech CSE



AI Systems • Voice Assistants • Tool Calling • Offline-First Architecture



"NeonAI is not about how smart the model is. It's about how controlled, safe, and purposeful AI should be."