feat: Pyramidize — full feature + CLI + eval pipeline + quality improvements by 0xMMA · Pull Request #17 · 0xMMA/KeyLint

0xMMA · 2026-03-09T23:45:50Z

What

Adds the Pyramidize feature — AI-powered document restructuring using the Pyramid Principle. Paste unstructured text, get a structured document with conclusion-first headers, bullet-point details, and a pipe-delimited subject line.

How it works

User pastes text → auto-detect doc type → AI restructures → self-QA scores quality
    → if score < threshold: AI refines → canvas editor for manual tweaks → send back

Four document types: email, wiki, memo, powerpoint — each with type-specific prompts and structure templates.

Two prompt variants live in parallel (selectable via --variant CLI flag or EVAL_VARIANT env var):

v1: detailed prompt + selfQA + conditional refinement pipeline
v2 (default): leaner self-contained prompt, no selfQA — half the size, equal or better quality

What's in the box

Backend (`internal/features/pyramidize/`)

Full pipeline: detect → foundation → optional refine
3 AI providers: OpenAI, Claude, Ollama — with provider/model overrides
Self-QA with 5 specialist lenses (subject, MECE, completeness, style, fidelity)
Platform-specific source app capture (Linux/Windows) for paste-back
App presets: remember doc type per source application

Frontend (`frontend/src/app/features/text-enhancement/`)

Canvas editor with original/restructured tab view + markdown preview
Global refine (instruction → full rewrite) and splice (selection → partial rewrite)
Trace log with snapshot/revert/peek
Provider/model/doc-type/style selectors
Quality score display with refinement warnings

CLI (`internal/cli/`)

./bin/KeyLint -fix "text" — silent grammar fix (file/stdin/inline)
./bin/KeyLint -pyramidize -f input.md — restructure from file
--json, --provider, --model, --variant flags

Eval framework (`//go:build eval`)

Deterministic checks: structure, info coverage, hallucination detection
LLM-as-judge: pyramid structure, clarity, completeness, tone preservation
13 anonymized German business email test samples
Results logged to test-data/eval-runs/<timestamp>/ with summary.json
./scripts/eval.sh --variant 1 vs ./scripts/eval.sh --variant 2 for comparison

Code quality (from ETC-focused review)

DefaultQualityThreshold constant — single source of truth
resolveAPIKey — decoupled keyring access from AI dispatch
Shared DOCUMENT_TYPE_OPTIONS — no more duplicate arrays
MarkdownPipe tests (20 cases)
Fixed: retry for all operations, clipboard path consistency, slice aliasing, FIDELITY_VIOLATION handler

Eval results (Claude Sonnet 4.6, 13 samples)

Metric	v1	v2 (default)
Avg deterministic	0.82	0.84
Avg judge overall	0.89	0.90
Structure (all samples)	1.00	1.00

Full analysis with per-sample Sonnet vs Opus comparison: docs/pyramidize/eval-v2-prompt-analysis.md

Test plan

go test ./internal/... — all packages pass
cd frontend && npm test — 128/128 tests pass
go build -o bin/KeyLint . — builds cleanly
EVAL_PROVIDER=claude go test -tags eval ... — requires API key
Manual E2E: paste email → pyramidize → edit canvas → send back

🤖 Generated with Claude Code

…NG-12) Go backend: - New internal/features/pyramidize/ package: 18 files - 2-call adaptive pipeline (detect + foundation+self-QA + optional refine) - RPCs: Pyramidize, RefineGlobal, Splice, CancelOperation, SendBack, GetSourceApp, GetAppPresets, SetAppPreset, DeleteAppPreset, GetQualityThreshold, SetQualityThreshold - XML-structured prompts for EMAIL, WIKI, MEMO, POWERPOINT doc types - Platform-specific source app capture: xdotool (Linux), Win32 (Windows) - 47 unit tests — all pass - settings/model.go: add AppPresets + PyramidizeQualityThreshold to Settings struct - main.go: register PyramidizeService; capture source app before clipboard grab Angular frontend: - text-enhancement.component.ts: complete rewrite as Pyramidize editor - 3-layer canvas model (originalText / pyramidizedText / canvasText) - Trace log with peek + revert, collapsible right panel - Step indicator, cancellation, error recovery with retry - Global instruction bar (Ctrl+Enter) + selection splice bubble - Hover copy in edit mode (mouse-position overlay) and preview mode - Copy as Markdown / Copy as Rich Text / Send Back actions - Module-level state survives navigation - text-enhancement.service.ts: rewritten to wrap all pyramidize RPCs - markdown.pipe.ts: new standalone MarkdownPipe for preview rendering - settings.component.ts: add App Defaults tab (presets + quality threshold) - wails.service.ts: add all 11 pyramidize RPC methods - wails-mock.ts: add pyramidize mocks - 108 Vitest tests pass (0 failures) - Binary builds cleanly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…dition wails3 task run was starting the binary before ng serve finished binding to port 9245. Reordering background/blocking executes gives ng serve the ~3s of Go build time to initialize before the binary connects to it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…pboard resilience Shell sidebar: - Collapsible sidebar with hover-expand popover (overlay, no layout shift) - Collapsed logo shows "KL" (K white, L orange) matching brand colours - Nav icons centred in collapsed strip; SVG pyramid icon colour-consistent with <i> icons on hover and active-route states - Version row hidden when empty-collapsed (no dead click-strip) - Collapse button taller (0.875rem padding, ≥40px tap target) - Layout switched to position:absolute sidebar + margin-left on main to prevent scrollbar flicker on hover-expand Pyramidize: - Upgrade default models: claude-sonnet-4-6, gpt-5.2 (was haiku / gpt-4o-mini) - Provider + model selectors in left panel (replaces provider badge) - Quality threshold moved from Settings › App Defaults to Pyramidize panel - Error messages clipped to 2 lines with copy icon; removed "Change Provider" button - Trace log full overlay replaces peek sub-panel - Canvas textarea/preview fill available height (flex chain fix) - Cancel propagates through RefineGlobal / Splice via aiOpts struct Clipboard / Linux: - Read/Write try xclip → xsel → wl-paste/wl-copy via LookPath; no crash on missing tools — clean error message instead of raw exec error - xdotool calls (focus, paste, source-app capture) are best-effort; missing binary logs a warning and returns nil Tests: - 80 Playwright tests across shell-menu*.spec.ts (centering, hover-expand, colour parity, layout dims, click targets, logo, active-route, scrollbar) - 13 Playwright tests in pyramidize-layout.spec.ts (tabs, canvas height, provider selectors, trace overlay, sidebar collapse) - 108 Vitest unit tests (0 failures) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@if

…or rename - Logo KL→KeyLint: CSS max-width unfold (no @if swap, pure transition) - Nav item 2px drop on hover-expand: span{line-height:1} prevents height growth - Quality threshold: replace plain <input type=number> with <p-inputnumber> - Rename "Canvas" tab/labels to "Editor" throughout text-enhancement - Trace log: hover shows non-sticky preview; click pins it - Add LOGO-ANIMATION.md requirement spec - Add BRANCH-STATUS.md feature-branch status overview - Update all E2E tests to match new logo DOM structure (logo-k/ey/l/int) - Add shell-menu-deep5.spec.ts covering nav vertical stability (7 tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

13 German-language email samples with raw input and accepted output pairs for testing the Pyramidize feature. All names, companies, and internal URLs have been replaced with fictional substitutes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds design spec covering CLI interface (machine-testable pyramidize and fix commands) and evaluation framework (deterministic + LLM-as-judge scoring with persistent run logging). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

10-task plan covering CLI dispatch, input reading, -fix and -pyramidize commands, deterministic eval checks, LLM-as-judge scoring, build-tagged integration tests, eval shell scripts, and documentation updates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…r overrides Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ucination Add fast, no-AI evaluation layer for pyramidize output quality: - checkStructure: verifies subject line and markdown headers - checkInfoCoverage: extracts key terms from input, checks >=70% appear in output - checkNoHallucination: detects proper nouns added that weren't in the input - extractKeyTerms: shared helper with German/English stop-word skip list - RunDeterministicChecks: aggregates all checks into an EvalScorecard Structural heading words (Kernergebnis, Hintergrund, etc.) are excluded from hallucination counting since pyramidize naturally introduces them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…nore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

parseTestData now handles header casing variations (Raw Input/input), typos (accpted), and unfenced raw text — all 13 samples parse correctly. Eval test and scripts load .env from project root for API keys. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Deterministic checks: accept bold headers, German compound decomposition (hyphen/slash/prefix-suffix), expanded business vocab exclusion list, percentage-based hallucination threshold. Avg deterministic: 0.40 → 0.83. Email prompt: tone preservation rules (no formality escalation, no person-switch, no editorial additions), 3-segment subject line cap. Self-eval: added fidelity specialist + FIDELITY_VIOLATION flag. Eval summary now logs effective provider/model (not just overrides). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Added explicit rule requiring mail history/quoted replies to be treated as input content, not disposable context. Fixes diagnose-update sample where Leo's backstory was being dropped entirely (judge completeness 0.72 → 0.92). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ing" This reverts commit 27a5f90.

Email prompt rewritten with v1-derived rules: explicit structure template (bold headers + bullet points, not prose), standalone content-statement headers (ÜBERSCHRIFTEN-REGELN), analysis phase for input scanning, compact style rules. Examples updated to show bullet structure. Added eval setup and commands to README and expanded CLAUDE.md eval section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

All three prompts now include: - Analysis phase (scan all input for relevant info before restructuring) - Structure template (header + bullet points, not prose paragraphs) - Style rules (compact, bullets for details, factual tone) - Standalone content-statement header rules (ÜBERSCHRIFTEN-REGELN) Examples updated to show bullet structure instead of prose paragraphs. Matches the same general principles applied to the email prompt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Most pipeline and UI items are now implemented. Added eval pipeline section. Only remaining unchecked items: parallel specialist agents (currently simplified as self-eval) and HTML clipboard paste-back. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move loose root files (PYRAMIDIZE.md, PYRAMIDIZE-UX.md, LOGO-ANIMATION.md, BRANCH-STATUS.md) into docs/pyramidize/ with clearer names. New docs: - requirements.md — extracted canonical requirements - adr-001-pipeline-architecture.md — architecture decision record (2-call self-QA vs v1's 6-call multi-agent, with v1 analysis) - quality-status.md — eval scores, open issues, improvement roadmap - research-nlp-langchain.md — NLP and LangChain research findings (sentence embeddings, langchaingo status, Eino, Wails CORS) Simplified TODO.md to cross-cutting feature-parity items only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CLAUDE.md: keep main's concise format + .claude/rules references, add CLI commands, eval section, pyramidize docs pointer. .gitignore: combine both sides (eval-runs, superpowers, settings.local, test-results). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Extract DefaultQualityThreshold constant (single source of truth) - Decouple callAISync from keyring via resolveAPIKey helper - Fix retry() to work for refineGlobal/splice, not just pyramidize - Route plain-text copy through WailsService.writeClipboard - Add markdown.pipe.spec.ts (20 tests for hand-rolled parser) - Extract shared DOCUMENT_TYPE_OPTIONS constant - Fix DeleteAppPreset slice aliasing (fresh backing array) - Add FIDELITY_VIOLATION case in refinement prompt switch - Remove dead resetModuleState() from component spec Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add prompt variant system — selectable via CLI (--variant), eval (EVAL_VARIANT env var), and request (promptVariant field). Variant 0 defaults to latest (currently v2). V2 is a leaner, self-contained email prompt from research sandbox. No selfQA block — quality evaluated externally by deterministic checks and LLM-as-judge. Half the prompt size, skips the QA/refine pipeline. Eval results (Sonnet 4.6, 13 samples): v1: det=0.82 judge=0.89 v2: det=0.84 judge=0.90 Full analysis with per-sample Sonnet vs Opus comparison, structural patterns, and improvement candidates in docs/pyramidize/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

0xMMA and others added 26 commits March 8, 2026 23:16

chore(docs): note double-v bug in collapsed sidebar version row

23548a0

feat(cli): add CLI dispatch skeleton with fix and pyramidize stubs

b10bc8b

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(cli): add input reading helper (file, stdin, inline)

0870b9e

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(cli): use bytes.Buffer for stderr in test to prevent nil panic

27737c1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(cli): implement -fix command with inline, file, and stdin input

213b971

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(cli): implement -pyramidize command with JSON output and provide…

cace43e

…r overrides Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(eval): add LLM-as-judge scoring with pyramid structure criteria

d0448cb

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(eval): add build-tagged integration tests with eval run logging

dab4ec5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(eval): add automated and human review eval scripts

9fdaa66

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: add CLI commands and eval build tag to CLAUDE.md, update .gitig…

a99031b

…nore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Revert "fix(prompt): preserve mail thread context in email restructur…

ac685d5

…ing" This reverts commit 27a5f90.

0xMMA changed the title ~~feat(shell,pyramidize): UX pass — sidebar polish, model upgrades, clipboard resilience~~ feat: Pyramidize — full feature + CLI + eval pipeline + quality improvements Mar 30, 2026

0xMMA and others added 3 commits March 30, 2026 22:45

0xMMA merged commit 531e110 into main Apr 1, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Pyramidize — full feature + CLI + eval pipeline + quality improvements#17

feat: Pyramidize — full feature + CLI + eval pipeline + quality improvements#17
0xMMA merged 29 commits into
mainfrom
feat/pyramidize

0xMMA commented Mar 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0xMMA commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How it works

What's in the box

Backend (internal/features/pyramidize/)

Frontend (frontend/src/app/features/text-enhancement/)

CLI (internal/cli/)

Eval framework (//go:build eval)

Code quality (from ETC-focused review)

Eval results (Claude Sonnet 4.6, 13 samples)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

0xMMA commented Mar 9, 2026 •

edited

Loading

Backend (`internal/features/pyramidize/`)

Frontend (`frontend/src/app/features/text-enhancement/`)

CLI (`internal/cli/`)

Eval framework (`//go:build eval`)