Ellmo - Llama.cpp Edge Infrastructure

Infrastructure for optimizing and testing LLM models on edge devices using Llama.cpp

Project Status

🚧 Under Development - See IMPLEMENTATION_PLAN.md for detailed roadmap

Overview

Ellmo is a Python-based infrastructure project for:

Downloading models from HuggingFace
Converting and quantizing models for edge deployment
Training and applying LoRA adapters
Benchmarking model performance on CPU
Running optimized inference on edge devices

Quick Start

# Setup (coming soon - Phase 0)
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Download a model (Phase 2)
ellmo download --model "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF" --variant Q4_K_M

# Run inference (Phase 3)
ellmo run --model tinyllama-Q4_K_M.gguf --prompt "Hello, world!"

# Quantize a model (Phase 5)
ellmo quantize --model tinyllama-f16.gguf --type Q4_K_M

# Benchmark performance (Phase 6)
ellmo benchmark --model tinyllama-Q4_K_M.gguf --runs 10

Features (Planned)

Core Capabilities

✅ Model downloading from HuggingFace Hub
✅ Model format conversion (PyTorch → GGUF)
✅ Multiple quantization levels (Q2_K → Q8_0)
✅ LoRA training and inference
✅ CPU-optimized inference engine
✅ Comprehensive benchmarking suite

Advanced Features

✅ Model registry and management
✅ OpenAI-compatible API server
✅ Configuration presets for common scenarios
✅ Visual benchmark reports
✅ Device-specific optimization profiles

Technology Stack

Language: Python 3.10+
Inference: llama.cpp + llama-cpp-python
Models: HuggingFace Hub
Optimization: Quantization, LoRA/PEFT
Target: CPU (edge devices)

Project Structure

ellmo/
├── src/                  # Source code
│   ├── models/          # Model downloading and conversion
│   ├── optimization/    # Quantization and LoRA
│   ├── inference/       # Inference engine
│   ├── benchmark/       # Performance testing
│   └── cli/             # Command-line interface
├── configs/             # Configuration files
├── models/              # Downloaded and optimized models
├── tests/               # Test suite
└── docs/                # Documentation

Development

See IMPLEMENTATION_PLAN.md for the detailed multi-phase implementation plan with testing checkpoints.

Current Phase

Phase 0: Project Bootstrap - Setting up basic infrastructure

Contributing

This project is currently in early development. Contribution guidelines will be added once the core infrastructure is complete.

Documentation

Core Documentation

Implementation Plan - Detailed 14-phase roadmap with testing strategy
Testing Strategy - Comprehensive testing approach and protocols

Architecture Decision Records (ADRs)

All major design decisions are documented with full context, rationale, and alternatives:

ADR Index - Complete list of all decisions
Decision Making Guide - How we make and document decisions
6 ADRs documenting Phase 0 decisions (~1,500 lines of documentation)

Key Decisions:

License

MIT License - see LICENSE file for details

Acknowledgments

llama.cpp - High-performance LLM inference
HuggingFace - Model hub and tools
PEFT - Parameter-efficient fine-tuning

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
docs		docs
external		external
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
LICENSE		LICENSE
PHASE0_COMPLETE.md		PHASE0_COMPLETE.md
PHASE1_TESTING.md		PHASE1_TESTING.md
README.md		README.md
SESSION_SUMMARY.md		SESSION_SUMMARY.md
TESTING_STRATEGY.md		TESTING_STRATEGY.md
activate.sh		activate.sh
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ellmo - Llama.cpp Edge Infrastructure

Project Status

Overview

Quick Start

Features (Planned)

Core Capabilities

Advanced Features

Technology Stack

Project Structure

Development

Current Phase

Contributing

Documentation

Core Documentation

Architecture Decision Records (ADRs)

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ellmo - Llama.cpp Edge Infrastructure

Project Status

Overview

Quick Start

Features (Planned)

Core Capabilities

Advanced Features

Technology Stack

Project Structure

Development

Current Phase

Contributing

Documentation

Core Documentation

Architecture Decision Records (ADRs)

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages