Pucho AI

High-Performance Local Multilingual LLM Chatbot

Pucho AI is a lightweight, production-ready framework for running Large Language Models (LLMs) locally with a modern web-based chat interface. It enables GPU and CPU execution, works efficiently on systems with as little as 8GB RAM, and delivers response speeds comparable to cloud-hosted chatbots — while keeping all data fully private.

Designed for developers, AI engineers, researchers, and privacy-focused deployments.

Core Capabilities

🖥️ Fully Local Execution – No external APIs, complete data privacy
⚡ Optimized Inference Engine – Powered by FastAPI + vLLM
🎮 GPU & CPU Support – Automatically adaptable to available hardware
💾 Low Resource Friendly – Runs smoothly on 8GB RAM systems
🌍 Multilingual Model Compatibility – Supports Hugging Face models
💬 Modern Web Chat UI – Clean, responsive interface with Markdown rendering
🧠 Reasoning Trace Support – Displays <think> outputs when enabled
🎨 Dark / Light Mode Support

System Architecture (Data Flow)

Pucho AI follows a simple and modular data flow:

User Input
→ Frontend (HTML Chat Interface)
→ FastAPI Backend
→ vLLM Inference Engine
→ Locally Downloaded Hugging Face Model
→ Response returned to Frontend

All components run entirely on your local machine, ensuring maximum performance, full privacy, and offline capability.

Project Structure

Pucho-AI/
├── download_model.py        # Script to download models from Hugging Face
├── run_llm_server.sh        # Script to start the LLM server
├── index.html               # Web-based chat UI
├── llm_models/              # Directory for downloaded models
└── requirements.txt         # Project dependencies

Installation & Setup

1. Clone the Repository

git clone https://github.com/shib1111111/Pucho-AI.git
cd Pucho-AI

2. Create & Activate Virtual Environment

python3 -m venv venv

Activate:

Linux / macOS:

source venv/bin/activate

Windows (PowerShell):

venv\Scripts\activate

3. Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

Download a Model

Download any compatible Hugging Face model:

python download_model.py --model_name Qwen/Qwen3-0.6B --cache_dir ./llm_models

Arguments:

--model_name → Model ID from Hugging Face
--cache_dir → Directory for storing models (default: ./llm_models)

Models will be stored locally inside:

llm_models/<model_name>/

Running the LLM Server

Make the script executable (Linux/macOS):

chmod +x run_llm_server.sh

Start the server:

./run_llm_server.sh

Default endpoint:

http://0.0.0.0:8000

Launch the Chat Interface

Option 1: Open index.html directly in a browser

Option 2 (Recommended): Serve via local HTTP server

python3 -m http.server 8001 --bind 0.0.0.0

Open:

http://localhost:8001/index.html

Backend Configuration

If required, update the following inside index.html:

const API_URL = "http://127.0.0.1:8000/v1/chat/completions";
const MODEL_NAME = "./llm_models/Qwen_Qwen3-0.6B";

Ensure:

Backend server is running
Model path matches your downloaded model

Example Workflow

1️⃣ Download Model

python download_model.py --model_name Qwen/Qwen3-0.6B

2️⃣ Start Backend

./run_llm_server.sh

3️⃣ Start Frontend

python3 -m http.server 8001

4️⃣ Open in Browser and Start Chatting 🎉

Performance Overview

Optimized for lightweight models (0.5B – 3B parameters)
GPU acceleration significantly improves generation speed
CPU mode performs reliably on 8GB RAM systems
Comparable response latency to hosted chatbot platforms (model-dependent)

Privacy & Security

Pucho AI ensures:

No data leaves your machine
No API keys required
No third-party tracking
Fully offline capability

Ideal for research labs, enterprise prototypes, and secure environments.

Use Cases

Local AI Assistants
Research & Model Evaluation
Enterprise AI Prototyping
Offline AI Systems
Educational AI Deployments

Roadmap

Dockerized deployment
Model selector within UI
Streaming token responses
Quantized model support (GGUF, AWQ)
Authentication & access control

Contributing

We welcome contributions to enhance MovieMaven. Feel free to open issues or submit pull requests.

License

This project is licensed under the MIT License.

Thank you for using MovieMaven! Feel free to reach out with any questions or feedback.

✨ --- Designed & made with Love by Shib Kumar Saraf ✨

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
Readme.md		Readme.md
download_model.py		download_model.py
index.html		index.html
requirements.txt		requirements.txt
run_llm_server.sh		run_llm_server.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pucho AI

High-Performance Local Multilingual LLM Chatbot

Core Capabilities

System Architecture (Data Flow)

Project Structure

Installation & Setup

1. Clone the Repository

2. Create & Activate Virtual Environment

3. Install Dependencies

Download a Model

Running the LLM Server

Launch the Chat Interface

Backend Configuration

Example Workflow

Performance Overview

Privacy & Security

Use Cases

Roadmap

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pucho AI

High-Performance Local Multilingual LLM Chatbot

Core Capabilities

System Architecture (Data Flow)

Project Structure

Installation & Setup

1. Clone the Repository

2. Create & Activate Virtual Environment

3. Install Dependencies

Download a Model

Running the LLM Server

Launch the Chat Interface

Backend Configuration

Example Workflow

Performance Overview

Privacy & Security

Use Cases

Roadmap

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages