Skip to content

shib1111111/Pucho-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pucho AI

High-Performance Local Multilingual LLM Chatbot

Pucho AI is a lightweight, production-ready framework for running Large Language Models (LLMs) locally with a modern web-based chat interface. It enables GPU and CPU execution, works efficiently on systems with as little as 8GB RAM, and delivers response speeds comparable to cloud-hosted chatbots — while keeping all data fully private.

Designed for developers, AI engineers, researchers, and privacy-focused deployments.

Core Capabilities

  • 🖥️ Fully Local Execution – No external APIs, complete data privacy
  • Optimized Inference Engine – Powered by FastAPI + vLLM
  • 🎮 GPU & CPU Support – Automatically adaptable to available hardware
  • 💾 Low Resource Friendly – Runs smoothly on 8GB RAM systems
  • 🌍 Multilingual Model Compatibility – Supports Hugging Face models
  • 💬 Modern Web Chat UI – Clean, responsive interface with Markdown rendering
  • 🧠 Reasoning Trace Support – Displays <think> outputs when enabled
  • 🎨 Dark / Light Mode Support

System Architecture (Data Flow)

Pucho AI follows a simple and modular data flow:

User Input
→ Frontend (HTML Chat Interface)
→ FastAPI Backend
→ vLLM Inference Engine
→ Locally Downloaded Hugging Face Model
→ Response returned to Frontend

All components run entirely on your local machine, ensuring maximum performance, full privacy, and offline capability.

Project Structure

Pucho-AI/
├── download_model.py        # Script to download models from Hugging Face
├── run_llm_server.sh        # Script to start the LLM server
├── index.html               # Web-based chat UI
├── llm_models/              # Directory for downloaded models
└── requirements.txt         # Project dependencies

Installation & Setup

1. Clone the Repository

git clone https://github.com/shib1111111/Pucho-AI.git
cd Pucho-AI

2. Create & Activate Virtual Environment

python3 -m venv venv

Activate:

Linux / macOS:

source venv/bin/activate

Windows (PowerShell):

venv\Scripts\activate

3. Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

Download a Model

Download any compatible Hugging Face model:

python download_model.py --model_name Qwen/Qwen3-0.6B --cache_dir ./llm_models

Arguments:

  • --model_name → Model ID from Hugging Face
  • --cache_dir → Directory for storing models (default: ./llm_models)

Models will be stored locally inside:

llm_models/<model_name>/

Running the LLM Server

Make the script executable (Linux/macOS):

chmod +x run_llm_server.sh

Start the server:

./run_llm_server.sh

Default endpoint:

http://0.0.0.0:8000

Launch the Chat Interface

Option 1: Open index.html directly in a browser

Option 2 (Recommended): Serve via local HTTP server

python3 -m http.server 8001 --bind 0.0.0.0

Open:

http://localhost:8001/index.html

Backend Configuration

If required, update the following inside index.html:

const API_URL = "http://127.0.0.1:8000/v1/chat/completions";
const MODEL_NAME = "./llm_models/Qwen_Qwen3-0.6B";

Ensure:

  • Backend server is running
  • Model path matches your downloaded model

Example Workflow

1️⃣ Download Model

python download_model.py --model_name Qwen/Qwen3-0.6B

2️⃣ Start Backend

./run_llm_server.sh

3️⃣ Start Frontend

python3 -m http.server 8001

4️⃣ Open in Browser and Start Chatting 🎉

Performance Overview

  • Optimized for lightweight models (0.5B – 3B parameters)
  • GPU acceleration significantly improves generation speed
  • CPU mode performs reliably on 8GB RAM systems
  • Comparable response latency to hosted chatbot platforms (model-dependent)

Privacy & Security

Pucho AI ensures:

  • No data leaves your machine
  • No API keys required
  • No third-party tracking
  • Fully offline capability

Ideal for research labs, enterprise prototypes, and secure environments.

Use Cases

  • Local AI Assistants
  • Research & Model Evaluation
  • Enterprise AI Prototyping
  • Offline AI Systems
  • Educational AI Deployments

Roadmap

  • Dockerized deployment
  • Model selector within UI
  • Streaming token responses
  • Quantized model support (GGUF, AWQ)
  • Authentication & access control

Contributing

We welcome contributions to enhance MovieMaven. Feel free to open issues or submit pull requests.

License

This project is licensed under the MIT License.

Thank you for using MovieMaven! Feel free to reach out with any questions or feedback.

✨ --- Designed & made with Love by Shib Kumar Saraf ✨

About

Pucho AI is a lightweight, production-ready framework for running Large Language Models (LLMs) locally with a modern web-based chat interface

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors