Tiny-LLM is in a final hardening / archive-readiness phase. Treat this repository as a CUDA/C++ engine that should become smaller, sharper, and lower-noise over time.
- Prioritize correctness, coherence, and maintainability over new roadmap expansion.
- Prefer deleting, merging, or rewriting low-value content instead of preserving redundant guidance.
- Keep public claims aligned with what the repository actually implements and validates.
- Avoid over-engineering new automation, process, or tooling layers.
- Active OpenSpec change in
openspec/changes/<name>/ - Current capability specs in
openspec/specs/ - Repository code and configuration
- Project instruction files (
AGENTS.md,CLAUDE.md,.github/copilot-instructions.md)
Local-only overrides such as CLAUDE.local.md are allowed for personal use, but they are not authoritative and should not be committed.
Use /opsx:explore for open-ended investigation, architecture trade-offs, and requirement clarification.
Use /opsx:propose <change-name> for:
- repository-wide cleanup
- workflow changes
- public-surface repositioning
- feature or behavior changes
- multi-file refactors with architectural impact
Use /opsx:apply <change-name> and implement directly from tasks.md. Work in dependency order and keep each task tightly scoped.
After major workflow, documentation, or architecture changes, run a review-oriented pass (/review or equivalent) before considering the workstream complete.
Use /opsx:archive <change-name> only after specs, code, docs, and validation results all agree.
- Do not reintroduce legacy
/specsworkflow guidance. - Keep contributor docs aligned with
openspec/. - Preserve history through OpenSpec archive, not through duplicated meta-docs.
- Prefer one authoritative document per topic.
- Remove stale changelog pages, duplicated contribution docs, and generic AI boilerplate.
- Keep README and Pages useful for new users, not just comprehensive.
- CI should validate, not rewrite tracked files or commit back to the branch.
- Keep workflow triggers and jobs as small as possible while preserving confidence.
- README, GitHub Pages, release notes, and GitHub metadata must describe the same project.
- Remove unsupported roadmap promises and speculative positioning.
- Language/toolchain: C++17, CUDA C++17, CMake 3.18+
- GPU baseline: CUDA 11.0+, Compute Capability 7.0+
- Error handling:
Result<T>for fallible operations; no exceptions for control flow - CUDA safety: use the project CUDA utilities/macros instead of silent failures
- Resource management: prefer RAII wrappers over raw ownership
- Formatting:
clang-format-18with the repository.clang-format - Editor baseline:
clangd+compile_commands.json
| Component | Responsibility |
|---|---|
Result<T> |
No-exception error propagation |
ModelConfig |
Model hyperparameters (vocab_size, hidden_dim, etc.) |
QuantizedWeight |
INT8 weights with per-group scales |
TransformerLayer |
W8A16 quantized attention + FFN |
KVCacheManager |
Pre-allocated cache slots for sequences |
InferenceEngine |
Public API: load(), generate() |
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTS=ON
cmake --build build -j$(nproc)
ctest --test-dir build --output-on-failure --timeout 300
find . -type f \( -name '*.cpp' -o -name '*.h' -o -name '*.cu' -o -name '*.cuh' \) \
! -path './build/*' | xargs clang-format-18 --dry-run --Werrornvcc must be available for a real configure/build.
- AGENTS.md: repository-wide rules and architecture constraints
- CLAUDE.md: Claude/agent execution guidance for this repo
- Copilot instructions: concise code-generation and workflow constraints
Use a long-running single session for large repository hardening work. Avoid /fleet unless the work is clearly parallelizable and the higher token cost is justified.
Use subagents and research helpers selectively:
- good fit: independent audits, workflow inspection, broad doc inventories
- bad fit: tightly coupled refactors where one agent needs end-to-end context
Keep MCP usage narrow. Prefer built-in GitHub tooling and gh unless MCP gives a clear repository-specific win.
openspec/ OpenSpec config, active changes, archived changes, specs, schemas
include/tiny_llm/ Public headers
src/ Host-side C++ implementation
kernels/ CUDA kernels
tests/ Unit and property tests
website/ GitHub Pages site
.github/workflows/ CI, Pages, release automation
- Do not add large new roadmap features just because they are easy to imagine.
- Do not preserve outdated docs “for completeness”.
- Do not commit personal-only tool config.
- Do not rely on CI to auto-fix formatting or repository drift.
- Do not split work across many stale branches without quick review and merge.