-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
Description
Run local LLMs for privacy-sensitive tasks and as cost-free fallbacks. Ties into Jay's existing work with local LLM integrations.
What to Build
- Integration with Ollama or llama.cpp for local model serving
- Model management: download, serve, health check via skill
- Routing integration: cost tracker routes privacy-sensitive or simple tasks to local models
- Models to support: Llama 3.x (8B for fast tasks), Qwen 2.5 Coder (for code tasks), Phi-3 (for lightweight reasoning)
Use Cases
- Embedding generation (local = free, unlimited)
- Simple classification tasks (intent detection, sentiment)
- Privacy-sensitive document processing
- Offline fallback when API providers are down
- Reducing monthly API costs
Architecture
skills/local-llm/
├── SKILL.md
├── index.js
├── providers/
│ ├── ollama.js
│ └── llamacpp.js
├── models.json # Available models + capabilities
└── data/
└── benchmark.json # Local vs API quality comparison
Hardware Constraint
Running on MacBook Air — need to be conscious of RAM/CPU. 8B parameter models max, quantized (Q4_K_M). Apple Silicon Neural Engine helps with inference speed.
Reactions are currently unavailable