Skip to content

feat: Pyramidize — full feature + CLI + eval pipeline + quality improvements#17

Merged
0xMMA merged 29 commits into
mainfrom
feat/pyramidize
Apr 1, 2026
Merged

feat: Pyramidize — full feature + CLI + eval pipeline + quality improvements#17
0xMMA merged 29 commits into
mainfrom
feat/pyramidize

Conversation

@0xMMA
Copy link
Copy Markdown
Owner

@0xMMA 0xMMA commented Mar 9, 2026

What

Adds the Pyramidize feature — AI-powered document restructuring using the Pyramid Principle. Paste unstructured text, get a structured document with conclusion-first headers, bullet-point details, and a pipe-delimited subject line.

How it works

User pastes text → auto-detect doc type → AI restructures → self-QA scores quality
    → if score < threshold: AI refines → canvas editor for manual tweaks → send back

Four document types: email, wiki, memo, powerpoint — each with type-specific prompts and structure templates.

Two prompt variants live in parallel (selectable via --variant CLI flag or EVAL_VARIANT env var):

  • v1: detailed prompt + selfQA + conditional refinement pipeline
  • v2 (default): leaner self-contained prompt, no selfQA — half the size, equal or better quality

What's in the box

Backend (internal/features/pyramidize/)

  • Full pipeline: detect → foundation → optional refine
  • 3 AI providers: OpenAI, Claude, Ollama — with provider/model overrides
  • Self-QA with 5 specialist lenses (subject, MECE, completeness, style, fidelity)
  • Platform-specific source app capture (Linux/Windows) for paste-back
  • App presets: remember doc type per source application

Frontend (frontend/src/app/features/text-enhancement/)

  • Canvas editor with original/restructured tab view + markdown preview
  • Global refine (instruction → full rewrite) and splice (selection → partial rewrite)
  • Trace log with snapshot/revert/peek
  • Provider/model/doc-type/style selectors
  • Quality score display with refinement warnings

CLI (internal/cli/)

  • ./bin/KeyLint -fix "text" — silent grammar fix (file/stdin/inline)
  • ./bin/KeyLint -pyramidize -f input.md — restructure from file
  • --json, --provider, --model, --variant flags

Eval framework (//go:build eval)

  • Deterministic checks: structure, info coverage, hallucination detection
  • LLM-as-judge: pyramid structure, clarity, completeness, tone preservation
  • 13 anonymized German business email test samples
  • Results logged to test-data/eval-runs/<timestamp>/ with summary.json
  • ./scripts/eval.sh --variant 1 vs ./scripts/eval.sh --variant 2 for comparison

Code quality (from ETC-focused review)

  • DefaultQualityThreshold constant — single source of truth
  • resolveAPIKey — decoupled keyring access from AI dispatch
  • Shared DOCUMENT_TYPE_OPTIONS — no more duplicate arrays
  • MarkdownPipe tests (20 cases)
  • Fixed: retry for all operations, clipboard path consistency, slice aliasing, FIDELITY_VIOLATION handler

Eval results (Claude Sonnet 4.6, 13 samples)

Metric v1 v2 (default)
Avg deterministic 0.82 0.84
Avg judge overall 0.89 0.90
Structure (all samples) 1.00 1.00

Full analysis with per-sample Sonnet vs Opus comparison: docs/pyramidize/eval-v2-prompt-analysis.md

Test plan

  • go test ./internal/... — all packages pass
  • cd frontend && npm test — 128/128 tests pass
  • go build -o bin/KeyLint . — builds cleanly
  • EVAL_PROVIDER=claude go test -tags eval ... — requires API key
  • Manual E2E: paste email → pyramidize → edit canvas → send back

🤖 Generated with Claude Code

0xMMA and others added 26 commits March 8, 2026 23:16
…NG-12)

Go backend:
- New internal/features/pyramidize/ package: 18 files
- 2-call adaptive pipeline (detect + foundation+self-QA + optional refine)
- RPCs: Pyramidize, RefineGlobal, Splice, CancelOperation, SendBack,
  GetSourceApp, GetAppPresets, SetAppPreset, DeleteAppPreset,
  GetQualityThreshold, SetQualityThreshold
- XML-structured prompts for EMAIL, WIKI, MEMO, POWERPOINT doc types
- Platform-specific source app capture: xdotool (Linux), Win32 (Windows)
- 47 unit tests — all pass
- settings/model.go: add AppPresets + PyramidizeQualityThreshold to Settings struct
- main.go: register PyramidizeService; capture source app before clipboard grab

Angular frontend:
- text-enhancement.component.ts: complete rewrite as Pyramidize editor
  - 3-layer canvas model (originalText / pyramidizedText / canvasText)
  - Trace log with peek + revert, collapsible right panel
  - Step indicator, cancellation, error recovery with retry
  - Global instruction bar (Ctrl+Enter) + selection splice bubble
  - Hover copy in edit mode (mouse-position overlay) and preview mode
  - Copy as Markdown / Copy as Rich Text / Send Back actions
  - Module-level state survives navigation
- text-enhancement.service.ts: rewritten to wrap all pyramidize RPCs
- markdown.pipe.ts: new standalone MarkdownPipe for preview rendering
- settings.component.ts: add App Defaults tab (presets + quality threshold)
- wails.service.ts: add all 11 pyramidize RPC methods
- wails-mock.ts: add pyramidize mocks
- 108 Vitest tests pass (0 failures)
- Binary builds cleanly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dition

wails3 task run was starting the binary before ng serve finished binding
to port 9245. Reordering background/blocking executes gives ng serve the
~3s of Go build time to initialize before the binary connects to it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pboard resilience

Shell sidebar:
- Collapsible sidebar with hover-expand popover (overlay, no layout shift)
- Collapsed logo shows "KL" (K white, L orange) matching brand colours
- Nav icons centred in collapsed strip; SVG pyramid icon colour-consistent
  with <i> icons on hover and active-route states
- Version row hidden when empty-collapsed (no dead click-strip)
- Collapse button taller (0.875rem padding, ≥40px tap target)
- Layout switched to position:absolute sidebar + margin-left on main to
  prevent scrollbar flicker on hover-expand

Pyramidize:
- Upgrade default models: claude-sonnet-4-6, gpt-5.2 (was haiku / gpt-4o-mini)
- Provider + model selectors in left panel (replaces provider badge)
- Quality threshold moved from Settings › App Defaults to Pyramidize panel
- Error messages clipped to 2 lines with copy icon; removed "Change Provider" button
- Trace log full overlay replaces peek sub-panel
- Canvas textarea/preview fill available height (flex chain fix)
- Cancel propagates through RefineGlobal / Splice via aiOpts struct

Clipboard / Linux:
- Read/Write try xclip → xsel → wl-paste/wl-copy via LookPath; no crash
  on missing tools — clean error message instead of raw exec error
- xdotool calls (focus, paste, source-app capture) are best-effort; missing
  binary logs a warning and returns nil

Tests:
- 80 Playwright tests across shell-menu*.spec.ts (centering, hover-expand,
  colour parity, layout dims, click targets, logo, active-route, scrollbar)
- 13 Playwright tests in pyramidize-layout.spec.ts (tabs, canvas height,
  provider selectors, trace overlay, sidebar collapse)
- 108 Vitest unit tests (0 failures)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…or rename

- Logo KL→KeyLint: CSS max-width unfold (no @if swap, pure transition)
- Nav item 2px drop on hover-expand: span{line-height:1} prevents height growth
- Quality threshold: replace plain <input type=number> with <p-inputnumber>
- Rename "Canvas" tab/labels to "Editor" throughout text-enhancement
- Trace log: hover shows non-sticky preview; click pins it
- Add LOGO-ANIMATION.md requirement spec
- Add BRANCH-STATUS.md feature-branch status overview
- Update all E2E tests to match new logo DOM structure (logo-k/ey/l/int)
- Add shell-menu-deep5.spec.ts covering nav vertical stability (7 tests)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
13 German-language email samples with raw input and accepted output
pairs for testing the Pyramidize feature. All names, companies, and
internal URLs have been replaced with fictional substitutes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds design spec covering CLI interface (machine-testable pyramidize
and fix commands) and evaluation framework (deterministic + LLM-as-judge
scoring with persistent run logging).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
10-task plan covering CLI dispatch, input reading, -fix and -pyramidize
commands, deterministic eval checks, LLM-as-judge scoring, build-tagged
integration tests, eval shell scripts, and documentation updates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r overrides

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ucination

Add fast, no-AI evaluation layer for pyramidize output quality:
- checkStructure: verifies subject line and markdown headers
- checkInfoCoverage: extracts key terms from input, checks >=70% appear in output
- checkNoHallucination: detects proper nouns added that weren't in the input
- extractKeyTerms: shared helper with German/English stop-word skip list
- RunDeterministicChecks: aggregates all checks into an EvalScorecard

Structural heading words (Kernergebnis, Hintergrund, etc.) are excluded from
hallucination counting since pyramidize naturally introduces them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
parseTestData now handles header casing variations (Raw Input/input),
typos (accpted), and unfenced raw text — all 13 samples parse correctly.
Eval test and scripts load .env from project root for API keys.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deterministic checks: accept bold headers, German compound decomposition
(hyphen/slash/prefix-suffix), expanded business vocab exclusion list,
percentage-based hallucination threshold. Avg deterministic: 0.40 → 0.83.

Email prompt: tone preservation rules (no formality escalation, no
person-switch, no editorial additions), 3-segment subject line cap.

Self-eval: added fidelity specialist + FIDELITY_VIOLATION flag.

Eval summary now logs effective provider/model (not just overrides).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added explicit rule requiring mail history/quoted replies to be treated
as input content, not disposable context. Fixes diagnose-update sample
where Leo's backstory was being dropped entirely (judge completeness
0.72 → 0.92).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Email prompt rewritten with v1-derived rules: explicit structure template
(bold headers + bullet points, not prose), standalone content-statement
headers (ÜBERSCHRIFTEN-REGELN), analysis phase for input scanning,
compact style rules. Examples updated to show bullet structure.

Added eval setup and commands to README and expanded CLAUDE.md eval section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All three prompts now include:
- Analysis phase (scan all input for relevant info before restructuring)
- Structure template (header + bullet points, not prose paragraphs)
- Style rules (compact, bullets for details, factual tone)
- Standalone content-statement header rules (ÜBERSCHRIFTEN-REGELN)

Examples updated to show bullet structure instead of prose paragraphs.
Matches the same general principles applied to the email prompt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Most pipeline and UI items are now implemented. Added eval pipeline
section. Only remaining unchecked items: parallel specialist agents
(currently simplified as self-eval) and HTML clipboard paste-back.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move loose root files (PYRAMIDIZE.md, PYRAMIDIZE-UX.md, LOGO-ANIMATION.md,
BRANCH-STATUS.md) into docs/pyramidize/ with clearer names.

New docs:
- requirements.md — extracted canonical requirements
- adr-001-pipeline-architecture.md — architecture decision record
  (2-call self-QA vs v1's 6-call multi-agent, with v1 analysis)
- quality-status.md — eval scores, open issues, improvement roadmap
- research-nlp-langchain.md — NLP and LangChain research findings
  (sentence embeddings, langchaingo status, Eino, Wails CORS)

Simplified TODO.md to cross-cutting feature-parity items only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@0xMMA 0xMMA changed the title feat(shell,pyramidize): UX pass — sidebar polish, model upgrades, clipboard resilience feat: Pyramidize — full feature + CLI + eval pipeline + quality improvements Mar 30, 2026
0xMMA and others added 3 commits March 30, 2026 22:45
CLAUDE.md: keep main's concise format + .claude/rules references,
add CLI commands, eval section, pyramidize docs pointer.
.gitignore: combine both sides (eval-runs, superpowers, settings.local, test-results).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract DefaultQualityThreshold constant (single source of truth)
- Decouple callAISync from keyring via resolveAPIKey helper
- Fix retry() to work for refineGlobal/splice, not just pyramidize
- Route plain-text copy through WailsService.writeClipboard
- Add markdown.pipe.spec.ts (20 tests for hand-rolled parser)
- Extract shared DOCUMENT_TYPE_OPTIONS constant
- Fix DeleteAppPreset slice aliasing (fresh backing array)
- Add FIDELITY_VIOLATION case in refinement prompt switch
- Remove dead resetModuleState() from component spec

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add prompt variant system — selectable via CLI (--variant), eval
(EVAL_VARIANT env var), and request (promptVariant field). Variant 0
defaults to latest (currently v2).

V2 is a leaner, self-contained email prompt from research sandbox.
No selfQA block — quality evaluated externally by deterministic checks
and LLM-as-judge. Half the prompt size, skips the QA/refine pipeline.

Eval results (Sonnet 4.6, 13 samples):
  v1: det=0.82 judge=0.89
  v2: det=0.84 judge=0.90

Full analysis with per-sample Sonnet vs Opus comparison, structural
patterns, and improvement candidates in docs/pyramidize/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@0xMMA 0xMMA merged commit 531e110 into main Apr 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant