LAI is a local-first AI orchestration platform scaffold for running and routing between multiple models, from lightweight classifiers to AirLLM-backed 70B-class execution models. The repository is intentionally structured for serious long-running workloads, disk-heavy model sharding, and future expansion into a full platform rather than a single script.
- Small models for request classification, safety checks, summarization, and routing.
- Large models for deep execution, overnight jobs, and high-quality final outputs.
- AirLLM-backed inference for large Hugging Face models on constrained hardware.
- A model registry and routing policy layer so the platform can choose the right model for the right phase of work.
- Clear boundaries between product code, runtime configuration, research, evaluation, and operations.
.
|-- .github/ GitHub issue templates, CI, reviews, ownership
|-- apps/ Deployable applications and future service surfaces
| |-- api/ Control plane and external API
| |-- web/ Future frontend or dashboard
| `-- worker/ Long-running local execution workers
|-- configs/ Model catalog, routing policies, prompt assets
|-- data/ Local-only caches, model shards, artifacts
|-- docs/ Architecture, setup guides, runbooks, ADRs
|-- evals/ Evaluation scenarios and saved benchmark outputs
|-- logs/ Local runtime logs
|-- notebooks/ Exploratory research notebooks
|-- scripts/ Bootstrap and developer automation
|-- src/lai/ Core Python package
|-- tests/ Unit, integration, and end-to-end validation
|-- CONTRIBUTING.md Contribution workflow
|-- GOVERNANCE.md Decision process and ownership model
|-- ROADMAP.md Delivery phases and milestones
|-- SECURITY.md Disclosure and hardening expectations
|-- SUPPORT.md Support channels and expectations
|-- pyproject.toml Python package and tooling entrypoint
`-- ruff.toml Linting rules
The current repository is a foundation for the following request flow:
- A user request is received by the API or CLI.
- A small routing model classifies intent, complexity, urgency, and safety needs.
- The orchestration layer chooses an execution tier from the routing policy.
- The selected runtime executes the task:
- small or medium model for fast tasks
- AirLLM-backed large model for heavyweight generation, reasoning, or overnight jobs
- The platform stores artifacts, logs, and evaluation traces for later review.
This matches the project goal of spending cheap compute on planning and reserving the biggest models for the parts of the work that truly benefit from them.
- Windows 11 or Linux
- Python 3.11 for project tooling stability
- Git and GitHub CLI
- NVIDIA drivers and CUDA-capable GPU when using large local inference
- 32 GB RAM minimum for comfortable local experimentation
- Large free disk budget for model downloads and AirLLM layer shards
- Hugging Face account and token for gated models
- Install AirLLM separately with
pip install airllm. - AirLLM can split a model into layer shards during first use, so the Hugging Face cache and shard directory must have substantial free disk space.
- Optional compression support may require
bitsandbytes. - CPU inference is possible, but the large-model path is designed around patience rather than interactivity.
git clone <your-repo-url>
cd LAI
py -3.11 -m venv .venv
.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -e .[dev]
Copy-Item .env.example .env
python -m lai.cli doctorTo add the large-model runtime later:
python -m pip install -e .[dev,api]
python -m pip install airllm- Pull request template and issue forms for consistent planning.
- CI that runs linting and unit tests on every push and pull request.
CODEOWNERSso review responsibility is explicit from day one.SECURITY.md,SUPPORT.md, and contribution guidance for a public-ready repository.- Dependabot updates for Python and GitHub Actions.
- Implement the model registry and routing engine under
src/lai/. - Add the first AirLLM runtime adapter and smoke-test workflows.
- Introduce an API surface in
apps/api. - Add evaluation scenarios that compare small-model routing against large-model final execution.
- AirLLM quickstart: https://github.com/lyogavin/airllm?tab=readme-ov-file#quickstart
- AirLLM requirements snapshot: https://raw.githubusercontent.com/lyogavin/airllm/main/requirements.txt