feat: Pyramidize — full feature + CLI + eval pipeline + quality improvements#17
Merged
Conversation
…NG-12) Go backend: - New internal/features/pyramidize/ package: 18 files - 2-call adaptive pipeline (detect + foundation+self-QA + optional refine) - RPCs: Pyramidize, RefineGlobal, Splice, CancelOperation, SendBack, GetSourceApp, GetAppPresets, SetAppPreset, DeleteAppPreset, GetQualityThreshold, SetQualityThreshold - XML-structured prompts for EMAIL, WIKI, MEMO, POWERPOINT doc types - Platform-specific source app capture: xdotool (Linux), Win32 (Windows) - 47 unit tests — all pass - settings/model.go: add AppPresets + PyramidizeQualityThreshold to Settings struct - main.go: register PyramidizeService; capture source app before clipboard grab Angular frontend: - text-enhancement.component.ts: complete rewrite as Pyramidize editor - 3-layer canvas model (originalText / pyramidizedText / canvasText) - Trace log with peek + revert, collapsible right panel - Step indicator, cancellation, error recovery with retry - Global instruction bar (Ctrl+Enter) + selection splice bubble - Hover copy in edit mode (mouse-position overlay) and preview mode - Copy as Markdown / Copy as Rich Text / Send Back actions - Module-level state survives navigation - text-enhancement.service.ts: rewritten to wrap all pyramidize RPCs - markdown.pipe.ts: new standalone MarkdownPipe for preview rendering - settings.component.ts: add App Defaults tab (presets + quality threshold) - wails.service.ts: add all 11 pyramidize RPC methods - wails-mock.ts: add pyramidize mocks - 108 Vitest tests pass (0 failures) - Binary builds cleanly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dition wails3 task run was starting the binary before ng serve finished binding to port 9245. Reordering background/blocking executes gives ng serve the ~3s of Go build time to initialize before the binary connects to it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pboard resilience Shell sidebar: - Collapsible sidebar with hover-expand popover (overlay, no layout shift) - Collapsed logo shows "KL" (K white, L orange) matching brand colours - Nav icons centred in collapsed strip; SVG pyramid icon colour-consistent with <i> icons on hover and active-route states - Version row hidden when empty-collapsed (no dead click-strip) - Collapse button taller (0.875rem padding, ≥40px tap target) - Layout switched to position:absolute sidebar + margin-left on main to prevent scrollbar flicker on hover-expand Pyramidize: - Upgrade default models: claude-sonnet-4-6, gpt-5.2 (was haiku / gpt-4o-mini) - Provider + model selectors in left panel (replaces provider badge) - Quality threshold moved from Settings › App Defaults to Pyramidize panel - Error messages clipped to 2 lines with copy icon; removed "Change Provider" button - Trace log full overlay replaces peek sub-panel - Canvas textarea/preview fill available height (flex chain fix) - Cancel propagates through RefineGlobal / Splice via aiOpts struct Clipboard / Linux: - Read/Write try xclip → xsel → wl-paste/wl-copy via LookPath; no crash on missing tools — clean error message instead of raw exec error - xdotool calls (focus, paste, source-app capture) are best-effort; missing binary logs a warning and returns nil Tests: - 80 Playwright tests across shell-menu*.spec.ts (centering, hover-expand, colour parity, layout dims, click targets, logo, active-route, scrollbar) - 13 Playwright tests in pyramidize-layout.spec.ts (tabs, canvas height, provider selectors, trace overlay, sidebar collapse) - 108 Vitest unit tests (0 failures) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…or rename - Logo KL→KeyLint: CSS max-width unfold (no @if swap, pure transition) - Nav item 2px drop on hover-expand: span{line-height:1} prevents height growth - Quality threshold: replace plain <input type=number> with <p-inputnumber> - Rename "Canvas" tab/labels to "Editor" throughout text-enhancement - Trace log: hover shows non-sticky preview; click pins it - Add LOGO-ANIMATION.md requirement spec - Add BRANCH-STATUS.md feature-branch status overview - Update all E2E tests to match new logo DOM structure (logo-k/ey/l/int) - Add shell-menu-deep5.spec.ts covering nav vertical stability (7 tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
13 German-language email samples with raw input and accepted output pairs for testing the Pyramidize feature. All names, companies, and internal URLs have been replaced with fictional substitutes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds design spec covering CLI interface (machine-testable pyramidize and fix commands) and evaluation framework (deterministic + LLM-as-judge scoring with persistent run logging). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
10-task plan covering CLI dispatch, input reading, -fix and -pyramidize commands, deterministic eval checks, LLM-as-judge scoring, build-tagged integration tests, eval shell scripts, and documentation updates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r overrides Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ucination Add fast, no-AI evaluation layer for pyramidize output quality: - checkStructure: verifies subject line and markdown headers - checkInfoCoverage: extracts key terms from input, checks >=70% appear in output - checkNoHallucination: detects proper nouns added that weren't in the input - extractKeyTerms: shared helper with German/English stop-word skip list - RunDeterministicChecks: aggregates all checks into an EvalScorecard Structural heading words (Kernergebnis, Hintergrund, etc.) are excluded from hallucination counting since pyramidize naturally introduces them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
parseTestData now handles header casing variations (Raw Input/input), typos (accpted), and unfenced raw text — all 13 samples parse correctly. Eval test and scripts load .env from project root for API keys. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deterministic checks: accept bold headers, German compound decomposition (hyphen/slash/prefix-suffix), expanded business vocab exclusion list, percentage-based hallucination threshold. Avg deterministic: 0.40 → 0.83. Email prompt: tone preservation rules (no formality escalation, no person-switch, no editorial additions), 3-segment subject line cap. Self-eval: added fidelity specialist + FIDELITY_VIOLATION flag. Eval summary now logs effective provider/model (not just overrides). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added explicit rule requiring mail history/quoted replies to be treated as input content, not disposable context. Fixes diagnose-update sample where Leo's backstory was being dropped entirely (judge completeness 0.72 → 0.92). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ing" This reverts commit 27a5f90.
Email prompt rewritten with v1-derived rules: explicit structure template (bold headers + bullet points, not prose), standalone content-statement headers (ÜBERSCHRIFTEN-REGELN), analysis phase for input scanning, compact style rules. Examples updated to show bullet structure. Added eval setup and commands to README and expanded CLAUDE.md eval section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All three prompts now include: - Analysis phase (scan all input for relevant info before restructuring) - Structure template (header + bullet points, not prose paragraphs) - Style rules (compact, bullets for details, factual tone) - Standalone content-statement header rules (ÜBERSCHRIFTEN-REGELN) Examples updated to show bullet structure instead of prose paragraphs. Matches the same general principles applied to the email prompt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Most pipeline and UI items are now implemented. Added eval pipeline section. Only remaining unchecked items: parallel specialist agents (currently simplified as self-eval) and HTML clipboard paste-back. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move loose root files (PYRAMIDIZE.md, PYRAMIDIZE-UX.md, LOGO-ANIMATION.md, BRANCH-STATUS.md) into docs/pyramidize/ with clearer names. New docs: - requirements.md — extracted canonical requirements - adr-001-pipeline-architecture.md — architecture decision record (2-call self-QA vs v1's 6-call multi-agent, with v1 analysis) - quality-status.md — eval scores, open issues, improvement roadmap - research-nlp-langchain.md — NLP and LangChain research findings (sentence embeddings, langchaingo status, Eino, Wails CORS) Simplified TODO.md to cross-cutting feature-parity items only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CLAUDE.md: keep main's concise format + .claude/rules references, add CLI commands, eval section, pyramidize docs pointer. .gitignore: combine both sides (eval-runs, superpowers, settings.local, test-results). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract DefaultQualityThreshold constant (single source of truth) - Decouple callAISync from keyring via resolveAPIKey helper - Fix retry() to work for refineGlobal/splice, not just pyramidize - Route plain-text copy through WailsService.writeClipboard - Add markdown.pipe.spec.ts (20 tests for hand-rolled parser) - Extract shared DOCUMENT_TYPE_OPTIONS constant - Fix DeleteAppPreset slice aliasing (fresh backing array) - Add FIDELITY_VIOLATION case in refinement prompt switch - Remove dead resetModuleState() from component spec Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add prompt variant system — selectable via CLI (--variant), eval (EVAL_VARIANT env var), and request (promptVariant field). Variant 0 defaults to latest (currently v2). V2 is a leaner, self-contained email prompt from research sandbox. No selfQA block — quality evaluated externally by deterministic checks and LLM-as-judge. Half the prompt size, skips the QA/refine pipeline. Eval results (Sonnet 4.6, 13 samples): v1: det=0.82 judge=0.89 v2: det=0.84 judge=0.90 Full analysis with per-sample Sonnet vs Opus comparison, structural patterns, and improvement candidates in docs/pyramidize/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds the Pyramidize feature — AI-powered document restructuring using the Pyramid Principle. Paste unstructured text, get a structured document with conclusion-first headers, bullet-point details, and a pipe-delimited subject line.
How it works
Four document types: email, wiki, memo, powerpoint — each with type-specific prompts and structure templates.
Two prompt variants live in parallel (selectable via
--variantCLI flag orEVAL_VARIANTenv var):What's in the box
Backend (
internal/features/pyramidize/)Frontend (
frontend/src/app/features/text-enhancement/)CLI (
internal/cli/)./bin/KeyLint -fix "text"— silent grammar fix (file/stdin/inline)./bin/KeyLint -pyramidize -f input.md— restructure from file--json,--provider,--model,--variantflagsEval framework (
//go:build eval)test-data/eval-runs/<timestamp>/with summary.json./scripts/eval.sh --variant 1vs./scripts/eval.sh --variant 2for comparisonCode quality (from ETC-focused review)
DefaultQualityThresholdconstant — single source of truthresolveAPIKey— decoupled keyring access from AI dispatchDOCUMENT_TYPE_OPTIONS— no more duplicate arraysMarkdownPipetests (20 cases)Eval results (Claude Sonnet 4.6, 13 samples)
Full analysis with per-sample Sonnet vs Opus comparison:
docs/pyramidize/eval-v2-prompt-analysis.mdTest plan
go test ./internal/...— all packages passcd frontend && npm test— 128/128 tests passgo build -o bin/KeyLint .— builds cleanlyEVAL_PROVIDER=claude go test -tags eval ...— requires API key🤖 Generated with Claude Code