-
Notifications
You must be signed in to change notification settings - Fork 0
Add AIDEN plan doc #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,252 @@ | ||
| # AIDEN — AI Dev EN | ||
|
|
||
| On-device AI assistant bundled alongside Prompd for internal generative features | ||
| (workflow scaffolding, file generation, explain/fix/rewrite, frontmatter suggestions). | ||
|
|
||
| Name is tentative: AIDEN = "AI Dev EN". Previously referred to as "AI Fox." | ||
|
|
||
| --- | ||
|
|
||
| ## Status | ||
|
|
||
| - **Phase:** 0 — Planning (this document) | ||
| - **Last updated:** 2026-04-19 | ||
| - **Blockers:** none | ||
| - **Next action:** confirm plan with Stephen, pick prototype platform, start Phase 1 | ||
|
|
||
| --- | ||
|
|
||
| ## Objective | ||
|
|
||
| Ship a local inference runtime that Prompd owns end-to-end, so AIDEN features | ||
| work with zero per-user cloud cost, zero required API keys, and offline-capable. | ||
| Runtime is shared across the Prompd ecosystem (desktop app, CLI, future VS Code | ||
| extension) via a local OpenAI-compatible HTTP endpoint. | ||
|
|
||
| --- | ||
|
|
||
| ## Non-goals | ||
|
|
||
| - **Not** a substitute for cloud providers (OpenAI / Anthropic / Groq) on | ||
| user-authored prmd executions. Those continue to use the existing provider | ||
| abstraction in `@prompd/core` with the user's own keys. | ||
| - **Not** a general-purpose LLM host. AIDEN is scoped to Prompd's internal | ||
| generative features only (scaffolding, explain, fix, generate, rewrite). | ||
| - **Not** a managed cloud service. Vertex AI path was evaluated and rejected | ||
| (cost: ~$3,780/mo/replica at the Model Garden one-click default). | ||
| - **Not** a required install. Fully opt-in; Prompd's base install and cloud- | ||
| provider features work without AIDEN. | ||
|
|
||
| --- | ||
|
|
||
| ## Architecture — Prompd-wide local daemon | ||
|
|
||
| ``` | ||
| ~/.prompd/aiden/ | ||
| ├── bin/ runtime binary (llama-server, platform-specific) | ||
| ├── models/ weight files (content-addressed by SHA) | ||
| ├── adapters/ LoRA adapters (prmd-tuned variants) | ||
| ├── config.json port, default model, mem limits, auth token | ||
| ├── aiden.pid running daemon PID | ||
| └── aiden.lock discovery file (port, pid, model, started-at) | ||
| ``` | ||
|
|
||
| - **Runtime:** vendored `llama-server` from llama.cpp (~15MB per platform). | ||
| Already speaks OpenAI-compatible HTTP — drops into `@prompd/core`'s provider | ||
| abstraction as just another endpoint. | ||
| - **Lifecycle owner:** `prompd-cli` — five-verb CLI surface: | ||
| - `prompd aiden install` — hardware probe, download binary, download weights, | ||
| write config. Idempotent. Re-runs handle model swaps and updates via | ||
| `--model <name>` (e.g., `--model e4b`, `--model prompd/aiden-prmd@0.2.0`). | ||
| - `prompd aiden uninstall` — wipe `~/.prompd/aiden/` entirely. | ||
| - `prompd aiden start` — launch daemon on local port. | ||
| - `prompd aiden stop` — graceful shutdown. | ||
| - `prompd aiden status` — running state, model loaded, version, memory, port, | ||
| installed models list. | ||
| - Interactive by default in a terminal; `-y` / `--yes` for headless/scripted | ||
| runs (also used by the desktop app when it triggers lifecycle via IPC). | ||
| - **Transport:** HTTP on `127.0.0.1:<port>` only (never binds to a public | ||
| interface). Optional bearer token in `config.json` to prevent other local | ||
| apps from snooping. | ||
| - **Discovery:** fixed default port with `aiden.lock` fallback so consumers | ||
| (desktop app, CLI, extension) find the running daemon reliably even if the | ||
| port was overridden. | ||
| - **Process model:** auto-start with desktop app, stop on app quit. CLI users | ||
| control explicitly via `prompd aiden start/stop`. | ||
|
|
||
| --- | ||
|
|
||
| ## Component map (cross-repo) | ||
|
|
||
| | Repo | What it owns | | ||
| |---|---| | ||
| | `prompd-cli` (typescript) | `aiden` subcommand, lifecycle, model catalog, download/verify, hardware probe, `aiden.lock` management | | ||
| | `prompd-app` (this repo) | Settings → AI panel, enable/disable flow, status bar indicator, IPC to CLI for lifecycle, AI feature call sites | | ||
| | `@prompd/core` | New `aiden` provider that targets `127.0.0.1:<port>` using existing OpenAI-compat client | | ||
| | `registry.prompdhub.ai` | Hosts AIDEN weight manifests + prmd-tuned adapter releases (CDN-backed) | | ||
|
|
||
| --- | ||
|
|
||
| ## UX flows | ||
|
|
||
| ### Opt-in | ||
|
|
||
| 1. User clicks **Enable AIDEN** (Settings → AI, or CTA on first AI-feature click). | ||
| 2. Modal: value prop + disk/RAM requirements. | ||
| 3. Hardware probe: RAM ≥ 6GB free, disk ≥ 3GB, GPU detected (Metal / CUDA / CPU). | ||
| 4. Pick model variant (defaulted by hardware): E2B Q4 for low-spec, E4B Q4 recommended. | ||
| 5. Download binary + weights with progress, pause/resume. | ||
| 6. `prompd aiden start` fires; daemon binds `127.0.0.1:<port>`. | ||
| 7. Status bar shows "AIDEN ready." AI features now work locally. | ||
|
|
||
| ### Opt-out | ||
|
|
||
| - Base install stays ~200MB. | ||
| - No background process, no GPU, no extra RAM. | ||
| - AI features either hidden or show the Enable CTA (decision deferred). | ||
| - Cloud providers still work via the user's own keys. | ||
|
|
||
| ### Uninstall | ||
|
|
||
| - Settings → AI → **Remove AIDEN** wipes `~/.prompd/aiden/`. | ||
| - Uninstalling Prompd itself prompts to keep or delete the directory (user may | ||
| reinstall and want tuned adapters preserved). | ||
|
|
||
| --- | ||
|
|
||
| ## Size & runtime targets | ||
|
|
||
| | Platform | Opt-in download | Runtime RAM (E4B Q4, idle) | | ||
| |---|---|---| | ||
| | macOS Apple Silicon | ~2.5GB (weights + 15MB binary, Metal built in) | ~4–6GB | | ||
| | Linux/Windows CPU-only | ~2.5GB | ~4–6GB | | ||
| | Linux/Windows + NVIDIA | ~2.7GB (+CUDA runtime ~150MB) | ~4–6GB VRAM preferred | | ||
| | Prompd-tuned variant | +100–200MB LoRA on top of base | negligible added | | ||
|
|
||
| Runtime is 0 RAM when app is closed or AIDEN is explicitly stopped. | ||
|
|
||
| --- | ||
|
|
||
| ## Phases | ||
|
|
||
| ### Phase 1 — Prototype (1–2 weeks) | ||
|
|
||
| Goal: prove the runtime + CLI wrapper feel right on one platform. | ||
|
|
||
| - Vendor `llama-server` for macOS Apple Silicon first | ||
| - Implement the five-verb CLI surface (`install / uninstall / start / stop / | ||
| status`) in `prompd-cli`, with `-y` headless flag | ||
| - Hardcode Gemma 4 E4B Q4 manifest for `install` in Phase 1 | ||
| - Smoke-test from the desktop app via an `aiden` provider in `@prompd/core` | ||
| - Decision gate: does the boot + first-token feel acceptable? | ||
|
|
||
| ### Phase 2 — Platform coverage (1–2 weeks) | ||
|
|
||
| - Add Linux (CPU + CUDA) and Windows (CPU + CUDA) binary bundles | ||
| - Hardware probe (RAM / disk / GPU detection) cross-platform | ||
| - Graceful shutdown, PID management, port discovery via `aiden.lock` | ||
| - Download progress, resume, integrity verification (SHA manifest) | ||
|
|
||
| ### Phase 3 — Desktop UX (1 week) | ||
|
|
||
| - Settings → AI panel with enable/disable, model switcher, disk usage, remove | ||
| - Status bar indicator | ||
| - First-time enable modal with hardware check | ||
| - CTA states for disabled AI features | ||
|
|
||
| ### Phase 4 — Fine-tuning pipeline (1 week) | ||
|
|
||
| - Assemble prmd corpus (public prmd files, docs, format spec, examples) | ||
| - LoRA-tune Gemma 4 E4B on one-off rented L4 (~$20 one-time) | ||
| - Publish adapter to registry, wire `prompd aiden model pull prompd/aiden-prmd` | ||
|
|
||
| ### Phase 5 — Polish & release | ||
|
|
||
| - Telemetry (opt-in): model boot time, token/sec, crash reports | ||
| - Release notes, docs page, blog post | ||
| - Enterprise story: offline mode, private deployment guidance | ||
|
|
||
| --- | ||
|
|
||
| ## Decisions log | ||
|
|
||
| All decisions below are **resolved** unless marked open. | ||
|
|
||
| - **2026-04-19 — Vertex AI rejected for this use case.** Model Garden one-click | ||
| default costs $3,780/mo/replica (g4-standard-48 + RTX PRO 6000). Even | ||
| minReplicaCount: 0 doesn't give real scale-to-zero on dedicated endpoints in | ||
| a way that fits a product feature. The existing `gemma-4-e2b-it-test` endpoint | ||
| on `prompd-io` has been undeployed and deleted. | ||
| - **2026-04-19 — Vertex serverless pay-per-token rejected.** Only Gemma 4 26B is | ||
| on the managed list; E2B/E4B aren't. Fine-tuning a serverless model isn't | ||
| possible either (tuned variants require dedicated endpoints), which kills the | ||
| "train it on prmd material" requirement. | ||
| - **2026-04-19 — Use case is AIDEN internal only.** Not for executing user- | ||
| authored prompts. That path continues to use cloud providers via the existing | ||
| abstraction. | ||
| - **2026-04-19 — Scope is Prompd-wide, not desktop-only.** Runtime lives in | ||
| `~/.prompd/aiden/` and is managed by `prompd-cli`. Desktop app, CLI, and | ||
| future consumers all connect to the same local endpoint. | ||
| - **2026-04-19 — Engine: vendor `llama-server`, not Ollama, not custom.** | ||
| `llama-server` is ~15MB, speaks OpenAI-compat natively, rides llama.cpp | ||
| upstream improvements for free. Ollama was rejected for bundle size and | ||
| "powered by" branding. A custom daemon was rejected as a 2–4 week + ongoing | ||
| maintenance trap when `llama-server` already does 90% of what's needed. | ||
| - **2026-04-19 — Transport: OpenAI-compatible HTTP on 127.0.0.1.** Drops into | ||
| the existing provider abstraction in `@prompd/core` with minimal code. | ||
| - **2026-04-19 — Default model: Gemma 4 E4B Q4 (recommended) / E2B Q4 (low- | ||
| spec).** Effective 4B is the sweet spot for structured codegen on prmd's | ||
| known schema. 26B unnecessary for this use case. | ||
| - **2026-04-19 — Opt-in only, never forced.** Base install stays lean; | ||
| AIDEN-dependent features are CTA'd, not hidden vs. shown TBD. | ||
| - **2026-04-19 — CLI surface locked at five verbs.** `install / uninstall / | ||
| start / stop / status`. No separate `model pull` / `update` verbs — `install` | ||
| is idempotent and handles model swaps / updates via `--model <name>`. | ||
| Interactive by default; `-y` / `--yes` for headless. | ||
|
|
||
| --- | ||
|
|
||
| ## Open questions | ||
|
|
||
| - **UX for disabled AI features** — hidden vs. visible-with-CTA? Leaning | ||
| CTA for discoverability, but that puts a "locked" icon all over the UI. | ||
| - **Default port number** — pick a stable Prompd-reserved port (like Ollama's | ||
| 11434). Needs to be unused, documented, and overridable. | ||
| - **Windows service vs. app-lifecycle daemon only** — Windows users might want | ||
| AIDEN always-on for CLI/VS Code extension usage. Systemd/launchd/service | ||
| installation is out of scope for Phase 1. | ||
| - **Weight hosting location** — registry CDN, HuggingFace Hub, or GCS? Leaning | ||
| registry-hosted for brand and control, with HuggingFace mirror for community | ||
| discovery. | ||
| - **Telemetry defaults** — opt-in only, definitely. Minimal envelope: boot time, | ||
| crash, token/sec aggregate. No prompt content ever. | ||
| - **prmd corpus for fine-tuning** — source of truth? Format spec, public | ||
| packages, docs? Need curation guidelines. | ||
|
|
||
| --- | ||
|
|
||
| ## Risks | ||
|
|
||
| - **User hardware variance.** Some users on 8GB RAM laptops will have a poor | ||
| experience even with E2B. Clear hardware gates + "CPU mode will be slow" | ||
| warnings are mandatory. | ||
| - **llama.cpp upstream churn.** New model archs land often and binary formats | ||
| occasionally break. Pin vendored versions, test before updating. | ||
| - **Weight-download bandwidth cost.** 2.5GB × N users can add up. CDN + resume | ||
| is mandatory. Consider torrent-style distribution later if scale warrants. | ||
| - **Windows + NVIDIA install pain.** CUDA driver version mismatches are a | ||
| support headache. Ship with a clear "CPU fallback if GPU init fails" path. | ||
| - **Branding: "AIDEN"** name collision search not done yet. Needs a quick IP | ||
| check before any public mention. | ||
|
|
||
| --- | ||
|
|
||
| ## Progress tracker | ||
|
|
||
| Update this as phases complete. Keep entries short. | ||
|
|
||
| - [ ] Phase 1 — Prototype on macOS Apple Silicon | ||
| - [ ] Phase 2 — Platform coverage (Linux, Windows, CUDA) | ||
| - [ ] Phase 3 — Desktop UX (Settings panel, status bar, first-run modal) | ||
| - [ ] Phase 4 — prmd corpus LoRA fine-tune + registry adapter release | ||
| - [ ] Phase 5 — Polish, telemetry, release | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model pullfrom the locked CLI command setThis Phase 4 task introduces a new
prompd aiden model pullcommand, but later in the same document the decisions log says the CLI surface is locked to exactly five verbs (install / uninstall / start / stop / status) and explicitly rejects separatemodel pull/updateverbs. Keeping both statements makes the plan internally inconsistent and can lead implementers to ship the wrong CLI contract.Useful? React with 👍 / 👎.