Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
b4b61c1
feat(pyramidize): implement full Pyramidize feature (ENG-01 through E…
0xMMA Mar 8, 2026
4defafa
fix(dev): start ng serve before Go build to avoid cold-start race con…
0xMMA Mar 8, 2026
708221f
feat(shell,pyramidize): UX pass — sidebar polish, model upgrades, cli…
0xMMA Mar 9, 2026
7d3be3a
fix(shell,enhance): logo unfold animation, nav drop, dark input, Edit…
0xMMA Mar 14, 2026
23548a0
chore(docs): note double-v bug in collapsed sidebar version row
0xMMA Mar 15, 2026
ee3146a
chore(test-data): add anonymized pyramidal email samples
0xMMA Mar 29, 2026
0f593e3
docs(spec): CLI + evaluation pipeline design for pyramidize quality
0xMMA Mar 29, 2026
94e8cfb
docs(plan): implementation plan for CLI + evaluation pipeline
0xMMA Mar 29, 2026
b10bc8b
feat(cli): add CLI dispatch skeleton with fix and pyramidize stubs
0xMMA Mar 29, 2026
0870b9e
feat(cli): add input reading helper (file, stdin, inline)
0xMMA Mar 29, 2026
27737c1
fix(cli): use bytes.Buffer for stderr in test to prevent nil panic
0xMMA Mar 29, 2026
213b971
feat(cli): implement -fix command with inline, file, and stdin input
0xMMA Mar 29, 2026
cace43e
feat(cli): implement -pyramidize command with JSON output and provide…
0xMMA Mar 29, 2026
d07b898
feat(eval): add deterministic checks — structure, info coverage, hall…
0xMMA Mar 29, 2026
d0448cb
feat(eval): add LLM-as-judge scoring with pyramid structure criteria
0xMMA Mar 29, 2026
dab4ec5
feat(eval): add build-tagged integration tests with eval run logging
0xMMA Mar 29, 2026
9fdaa66
feat(eval): add automated and human review eval scripts
0xMMA Mar 29, 2026
a99031b
docs: add CLI commands and eval build tag to CLAUDE.md, update .gitig…
0xMMA Mar 29, 2026
019085b
fix(eval): case-insensitive test-data parsing, .env key loading
0xMMA Mar 29, 2026
cb0971a
feat(eval): calibrate deterministic checks, improve email prompt quality
0xMMA Mar 29, 2026
27a5f90
fix(prompt): preserve mail thread context in email restructuring
0xMMA Mar 29, 2026
ac685d5
Revert "fix(prompt): preserve mail thread context in email restructur…
0xMMA Mar 29, 2026
6496fbc
feat(prompt): port v1 pyramid structure rules, update docs
0xMMA Mar 29, 2026
b980183
feat(prompt): port v1 pyramid structure rules to wiki, memo, powerpoint
0xMMA Mar 29, 2026
8827f1b
docs: update TODO.md to reflect current pyramidize implementation state
0xMMA Mar 29, 2026
410c56a
docs: consolidate pyramidize docs into docs/pyramidize/
0xMMA Mar 30, 2026
0c582cb
merge: resolve conflicts with main (CLAUDE.md, .gitignore)
0xMMA Mar 30, 2026
ef3210c
refactor: code review quick wins — ETC improvements
0xMMA Apr 1, 2026
e54236e
feat(prompt): add v2 email prompt variant with eval analysis
0xMMA Apr 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 165 additions & 0 deletions .claude/commands/refine-requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# Refine Requirements — Interactive Requirements Engineering

Take a plan file and refine its requirements through structured exploration, interactive Q&A with ASCII mockups, and play-pretend walkthroughs that surface gaps naturally.

**Input:** `$ARGUMENTS` is the path to the plan file (e.g. `PYRAMIDIZE.md`, `docs/plan.md`).

If `$ARGUMENTS` is empty, use `AskUserQuestion` to ask: "Which plan file should I refine? (relative path from project root)"

---

## Rules

- **Max 3-4 questions per round.** Never wall-of-text the user.
- **Never assume — ask when ambiguous.** A wrong assumption costs more than a question.
- **Never copy v1 blindly.** If there's prior art, question whether old decisions still apply.
- **Always show, don't tell.** Every question with a UI or layout implication gets an ASCII mockup. Abstract descriptions are not acceptable — make it concrete.
- **Stay in character during Phase 3.** Narrate as if the feature exists. Break character only to surface a gap, then resume.
- **Requirements and design only.** Never ask about implementation details (JSON parsing, HTTP clients, DI wiring, test scaffolding) — those are the developer's domain.
- **Capture decisions immediately.** After each round, note what was decided before moving on.

---

## Phase 1 — Deep Exploration (silent)

Do all of this silently. Do NOT output anything to the user yet.

1. Read the plan file at `$ARGUMENTS`.
2. Read `CLAUDE.md` and any architecture docs it references (`.claude/docs/architecture.md`, `.claude/docs/testing.md`, etc.).
3. Search the codebase for existing implementations related to the plan's feature area — look at the code, not just file names. Check for archived or previous versions if referenced.
4. Read any related Angular components, Go services, and shared utilities that the feature will touch or extend.
5. Build a mental model of:
- What exists today that this feature builds on
- What constraints the current architecture imposes
- What patterns the codebase already uses (and should be followed)
- Where the plan has gaps, ambiguities, or implicit assumptions

When done, output a single short message: "I've explored the codebase and the plan. Starting requirements review — Phase 2."

---

## Phase 2 — Structured Requirements Review

Go through the plan section by section. For each section that has decisions to make or ambiguities to resolve:

1. Present 3-4 questions (never more per round).
2. Every question MUST include:
- **2-4 concrete options** (labelled A, B, C, D)
- **ASCII preview mockup** for any option that affects layout, UI, or user-visible behaviour
- **Your recommendation** with a brief rationale (1 sentence)
3. Use `AskUserQuestion` to collect the user's choices.
4. After each round, summarize decisions made in a compact list before moving to the next section.

Example question format:
```
**Q2: How should the error state appear?**

Option A — Inline below the action area:
┌─────────────────────────────────────┐
│ [Action Button] │
│ ❌ Step 2/3 failed: timeout. │
│ [Retry] [Settings →] │
└─────────────────────────────────────┘

Option B — Toast notification:
┌─────────────────────────────────────┐
│ [Action Button] │
│ ┌──────────────┐ │
│ │ ❌ Timeout │ │
│ │ [Retry] │ │
│ └──────────────┘ │
└─────────────────────────────────────┘

Recommendation: A — keeps error context near the action.
```

Continue until all sections have been reviewed. Then announce: "Requirements review complete. Moving to play-pretend walkthrough — Phase 3."

---

## Phase 3 — Play-Pretend Walkthrough

Walk through the feature as if it's already built and shipping. You narrate in present tense. The user is the product owner / architect — you ask them requirements and design questions, never code-level implementation details.

### How to narrate

Speak as if you're a QA tester or product reviewer using the finished feature for the first time:

> "I open the app and navigate to the Pyramidize page. The left panel shows a doc type selector set to AUTO, a style dropdown, and a relationship dropdown. Below them is a large Pyramidize button with a Ctrl+Enter hint. The canvas area is empty — I see a placeholder with ghost text showing a sample pyramidized email..."

### When to pause and ask

Pause the narration whenever:
- The spec doesn't say what should happen → surface the gap
- Two requirements seem to conflict → ask which takes priority
- A behaviour feels wrong from a UX perspective → propose an alternative
- An edge case isn't covered → ask for the desired behaviour

When pausing, break character briefly:

> "**Gap found:** The spec doesn't say what happens when the user clicks Pyramidize again while the canvas already has edits from a previous run. Should it:
> A) Overwrite canvasText with the new result (edits lost)
> B) Ask 'Re-pyramidize from original? Your canvas edits will be lost' with [Yes] [No]
> C) Create a new trace entry and overwrite silently (edits recoverable via trace log)
>
> Recommendation: C — edits are never truly lost thanks to the trace log."

Then use `AskUserQuestion` to get the decision, note it, and resume narrating.

### Minimum scenarios to walk through

Cover ALL of these angles (not just UI walkthroughs):

1. **Happy path** — the golden scenario, start to finish
2. **First-time user** — no config, no presets, empty state
3. **Returning user** — presets exist, muscle memory, what's faster now
4. **Error / timeout** — API fails mid-pipeline, what does the user see and do
5. **Interruption / cancel** — user cancels during processing, closes mid-edit
6. **Edge cases** — empty input, very long input, mixed languages, rapid repeated actions
7. **State & lifecycle** — navigate away and back, minimize to tray, close window
8. **Hotkey vs manual** — differences in flow, what's available vs hidden
9. **Architecture choices** — new service vs extending existing, separate route vs same page, shared state
10. **Scope boundaries** — "this could grow into X — is X in scope or deferred?"

When all scenarios are covered, announce: "Play-pretend walkthrough complete. Moving to gap resolution — Phase 4."

---

## Phase 4 — Gap Resolution

1. Compile all remaining gaps, open questions, and ambiguities discovered during Phases 2 and 3.
2. Present them as a numbered list, grouped by theme (UI, behaviour, state, error handling, scope).
3. Ask in batches of 3-4 using `AskUserQuestion`.
4. For each gap, either:
- Resolve it with the user's decision, OR
- Mark it explicitly as **out of scope** with a reason

Continue until all gaps are resolved or marked out-of-scope. Then announce: "All gaps resolved. Updating the plan — Phase 5."

---

## Phase 5 — Document Update

1. Update the plan file (`$ARGUMENTS`) with all decisions made during this session:
- Add/modify requirement entries
- Update scoping decisions table
- Update out-of-scope table
- Add any new sections needed (e.g., new user stories, new NFRs)
- Add a "Last updated" timestamp
2. Present a completion checklist:

```
Requirements Refinement Complete
────────────────────────────────
✅ Plan explored and understood
✅ N questions resolved across M rounds
✅ K scenarios walked through
✅ J gaps resolved, L marked out-of-scope
✅ Plan file updated: [filename]

Want to do one more round? (e.g., "walk through the admin scenario" or "what about offline mode?")
```

3. Use `AskUserQuestion` to ask if the user wants one more round.
- If yes: return to Phase 3 or Phase 4 as appropriate, then repeat Phase 5.
- If no: end the session.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,7 @@ frontend/dist
frontend/node_modules
build/linux/appimage/build
build/windows/nsis/MicrosoftEdgeWebview2Setup.exe
test-data/eval-runs/
.superpowers/
.claude/settings.local.json
frontend/test-results
frontend/test-results
34 changes: 33 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@

**Key directories:**
- `main.go` — Wails entry point, service registration, event loop
- `internal/features/` — vertical slices: settings, shortcut, clipboard, tray, enhance, welcome, logger, updater
- `internal/features/` — vertical slices: settings, shortcut, clipboard, tray, enhance, welcome, logger, updater, pyramidize
- `internal/cli/` — headless CLI commands (`-fix`, `-pyramidize`), dispatched from `main.go` before Wails boots
- `internal/app/wire.go` + `wire_gen.go` — Wire DI (never edit `wire_gen.go` manually)
- `frontend/src/app/core/wails.service.ts` — sole RPC bridge; all Go calls go through here
- `frontend/src/app/features/` — folder-per-component, subcomponents in nested folders → See `.claude/rules/architecture.md#component-structure`
Expand All @@ -15,6 +16,33 @@

**Build / run / test:** `cd frontend && npm run build` (Angular), `go build -o bin/KeyLint .`, `wails3 dev` (hot-reload). Tests: `cd frontend && npm test` (Vitest), `go test ./internal/...` (Go), `cd frontend && npx playwright test` (E2E, needs `ng serve` on :4200). → See `.claude/rules/workflows.md` for post-change steps.

**CLI commands (headless, no GUI):**
```
./bin/KeyLint -fix "text to fix" # silent grammar fix
./bin/KeyLint -fix -f input.txt # fix from file
cat input.txt | ./bin/KeyLint -fix # fix from stdin
./bin/KeyLint -pyramidize -type email -f input.md # pyramidize from file
./bin/KeyLint -pyramidize --json -f input.md # JSON output with quality score
./bin/KeyLint -pyramidize --provider claude --model claude-sonnet-4-6 -f input.md
./bin/KeyLint -pyramidize --variant 1 -f input.md # use prompt variant v1
./bin/KeyLint -pyramidize --variant 2 -f input.md # use prompt variant v2 (0=latest)
```

**Evaluation tests (real API calls — NOT run by default):**
```
# Requires .env with ANTHROPIC_API_KEY (or OPENAI_API_KEY) in project root.
# Uses //go:build eval tag — never included in normal `go test` runs.
# Results are logged to test-data/eval-runs/<timestamp>/ with summary.json.
go test -tags eval ./internal/features/pyramidize/ -v -timeout 300s
EVAL_PROVIDER=claude go test -tags eval ./internal/features/pyramidize/ -v -timeout 600s
EVAL_PROVIDER=claude EVAL_MODEL=claude-sonnet-4-6 go test -tags eval ...
./scripts/eval.sh # automated eval with summary
./scripts/eval.sh --provider claude --model claude-sonnet-4-6
./scripts/eval.sh --variant 1 # compare v1 vs v2 prompts
EVAL_VARIANT=2 go test -tags eval ... # variant via env var
./scripts/eval-human.sh # interactive human review mode
```

## Why (The Context)

KeyLint is a desktop app that fixes/enhances clipboard text via AI (OpenAI, Anthropic, Ollama, Bedrock). A global hotkey silently grabs clipboard text, enhances it, and writes it back. The main UI provides manual fix and advanced enhancement modes.
Expand All @@ -23,6 +51,8 @@ KeyLint is a desktop app that fixes/enhances clipboard text via AI (OpenAI, Anth
- AI API calls go through the Go backend (`internal/features/enhance/service.go:1`) — WebKit2GTK on Linux blocks external HTTPS fetch from the webview
- API keys stored in OS keyring (`github.com/zalando/go-keyring`); env vars take priority over keyring — see `internal/features/settings/service.go`
- PrimeNG Stepper was replaced with a custom `@switch`-based wizard (`welcome-wizard.component.ts`) because PrimeNG v21 StepPanel animations broke DOM visibility
- CLI mode (`-fix`, `-pyramidize`) dispatches before Wails boots in `main.go`, uses the same service layer with manual wiring (no Wire/Wails). Prompts are identical between CLI and GUI — output formatting is the caller's concern.
- Evaluation tests use `//go:build eval` build tag to isolate from normal `go test` runs. They make real API calls and write results to `test-data/eval-runs/<timestamp>/`.

**Component flow:** `main.go` → Wire DI initializes services → Wails registers them → `wails3 generate bindings` generates JS → `wails.service.ts` wraps bindings → Angular components call `WailsService`

Expand Down Expand Up @@ -54,3 +84,5 @@ KeyLint is a desktop app that fixes/enhances clipboard text via AI (OpenAI, Anth
**Rules (active steering):** `.claude/rules/architecture.md`, `.claude/rules/testing.md`, `.claude/rules/workflows.md`

**Reference docs:** `.claude/docs/architecture.md` (service wiring, RPC bridge, platform differences), `.claude/docs/testing.md` (detailed patterns), `.claude/docs/versioning.md` (release pipeline, CI)

**Pyramidize docs:** `docs/pyramidize/` (requirements, ADR, quality status, NLP/LangChain research, UX roadmap)
24 changes: 23 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,27 @@ go test ./internal/...
npx playwright test
```

### Evaluation (Pyramidize Quality)

The pyramidize feature has an automated eval pipeline that measures output quality against baseline test data using deterministic checks (structure, info coverage, hallucination detection) and LLM-as-judge scoring.

```bash
# Setup: create .env in project root with your API key
echo "ANTHROPIC_API_KEY=sk-ant-..." > .env

# Run eval (uses //go:build eval tag — isolated from normal tests)
EVAL_PROVIDER=claude go test -tags eval ./internal/features/pyramidize/ -v -timeout 600s

# Or use the wrapper script (supports --provider / --model flags)
./scripts/eval.sh --provider claude

# Interactive human review mode (side-by-side comparison)
./scripts/eval-human.sh --provider claude

# Results are logged to test-data/eval-runs/<timestamp>/
# Each run produces: summary.json, results.jsonl, samples/
```

### Wire DI Regeneration

```bash
Expand All @@ -116,7 +137,8 @@ KeyLint/
├── main.go # Entry point — CLI flags, Wails app setup
├── internal/
│ ├── app/ # Wire DI (wire.go + wire_gen.go)
│ └── features/ # Vertical slices: settings, shortcut, clipboard, tray, enhance, welcome
│ ├── cli/ # Headless CLI commands (-fix, -pyramidize)
│ └── features/ # Vertical slices: settings, shortcut, clipboard, tray, enhance, welcome, pyramidize
├── frontend/
│ ├── src/app/
│ │ ├── core/ # WailsService (bindings bridge), MessageBus, guards
Expand Down
93 changes: 15 additions & 78 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -1,96 +1,33 @@
# KeyLint — Feature Parity TODO

Audit of gaps between v1 (Rust/Tauri) and v2 (Go/Wails).
Focus: the two core features — **Silent Fix** and **Pyramidize**.

---

## System Tray & Window Lifecycle

- [x] **Minimize to tray on close** — `ApplicationShouldTerminateAfterLastWindowClosed: false` set in
`main.go`; window-close event calls `window.Hide()`.

- [x] **Tray icon click / double-click brings window to front** — `tray.OnClick` and
`tray.OnDoubleClick` handlers added in `internal/features/tray/service.go`.
Remaining gaps between v1 (Rust/Tauri) and v2 (Go/Wails).
For Pyramidize-specific status, see `docs/pyramidize/`.

---

## Silent Fix

- [x] **Auto-paste to source app** — `PasteToForeground` implemented on both platforms:
Windows via Win32 `SendInput` (`paste_windows.go`), Linux via `xdotool` (`paste_linux.go`).

- [x] **Auto-paste to source app** — `PasteToForeground` on both platforms
- [ ] **Linux hotkey** — currently a no-op stub (`service_linux.go`). Wire up a real global
shortcut (e.g. `github.com/robotn/gohook` or `xbindkeys` integration).

- [ ] **HTML clipboard support** — detect foreground app (Outlook, Word, LibreOffice, etc.),
- [ ] **HTML clipboard support** — detect foreground app (Outlook, Word, LibreOffice),
convert Markdown output to HTML, write both CF_HTML and CF_TEXT to clipboard.
v1 had `HtmlClipboardService` with app-name regex matching.

---

## Version & Updates

- [x] **Version + update indicator in main nav** *(v4.0.0-alpha finding)* — display the app version
in small text at the bottom-left of the shell nav alongside a single icon that lights up when
an update is available. Clicking the icon (or version text) should navigate to Settings → About.
The version string is already available via `wails.getVersion()`; update status via
`wails.checkForUpdate()`. Currently only visible in Settings → About.

---

## Pyramidize (Advanced Mode)

The current `TextEnhancementComponent` is a single-pass generic fix with no pyramidal logic.
The entire v1 `PyramidalAgentService` pipeline needs to be rebuilt in Go + Angular.

### Pipeline (Generate → Specialists → QA)

- [ ] **Document type detection** — LLM classifies input as EMAIL / WIKI / POWERPOINT / MEMO
(or user selects manually). Returns `{type, language, confidence}`.

- [ ] **Oneshot foundation generator** — document-type-specific prompt templates (German + English)
that convert raw text into a structured document: subject + headers + body.
Output: `{subject, headers[], fullDocument, documentType, language}`.

- [ ] **Parallel specialist agents** — run concurrently after the foundation step:
- Subject Line Specialist — validates format + information density
- Header Structure Specialist — MECE principle + pyramidal hierarchy
- Information Completeness Specialist — detects info loss vs original
- Style & Language Specialist — tone, consistency, professional polish
- Each returns a confidence score (0.0–1.0).

- [ ] **Integration coordinator** — selectively applies specialist improvements where
confidence > 0.7; preserves baseline on low-confidence suggestions.

- [ ] **Quality assurance check** — final pass returns
`{informationLoss[], accuracyIssues[], missingElements[], overallScore, passed}`.

### UI Controls (missing from v2)

- [ ] Document type selector (AUTO / EMAIL / WIKI / POWERPOINT / MEMO)
- [ ] Communication style selector (concise / detailed / persuasive / neutral /
diplomatic / direct / casual / professional)
- [ ] Relationship level selector (formal / professional / casual / friendly)
- [ ] Custom instructions textarea
- [ ] Markdown rendering for output (replace readonly `<textarea>`)
- [ ] Editable output (allow manual tweaks after AI generation)
- [ ] Tab view: Draft vs Original

### Clipboard integration
## Pyramidize

- [ ] **HTML clipboard paste-back** — same as Silent Fix: convert Markdown output to HTML
and paste to source app with proper MIME types.
- [x] Full pipeline (detect -> foundation -> self-QA -> refine)
- [x] CLI mode (`-fix`, `-pyramidize`)
- [x] Evaluation framework (deterministic + LLM-as-judge)
- [x] All UI controls (doc type, style, relationship, custom instructions, canvas, trace log)
- [ ] **HTML clipboard paste-back** — convert Markdown to HTML for Outlook/Teams
- [ ] **Parallel specialist agents** — v1 had 4 independent specialists; currently simplified as self-eval. See `docs/pyramidize/adr-001-pipeline-architecture.md`.

---

## Priority Order
## Platform

1. ~~**Auto-paste to source app**~~ ✓ done
2. ~~**Minimize to tray on close**~~ ✓ done
3. ~~**Tray icon click brings window to front**~~ ✓ done
4. ~~**Version + update indicator in nav**~~ ✓ done
5. Pyramidize pipeline (core value proposition)
6. Pyramidize UI controls
7. Linux hotkey
8. HTML clipboard support
- [x] Minimize to tray on close
- [x] Tray icon click brings window to front
- [x] Version + update indicator in nav
Loading
Loading