From 0e7a7cf5057a71e7e3d83bb31f79f66b4006ed96 Mon Sep 17 00:00:00 2001 From: AmElmo Date: Tue, 24 Feb 2026 17:18:03 +0700 Subject: [PATCH 1/3] chore: add demo fixtures with 5 NovaMind AI projects at different stages Add pre-populated demo fixtures for product demos, themed around a fictional AI company (NovaMind AI) building developer tools. Includes: - 001-ai-prompt-playground: Complete with mixed issue statuses - 002-context-aware-autocomplete: Docs review stage (awaiting approval) - 003-ai-test-generator: Mid-workflow (PM done, UX generating) - 004-smart-log-analyzer: Early stage (PM questions awaiting answers) - 005-model-performance-dashboard: Complete, all issues pending Also adds scripts/load-demo.sh to load fixtures into outputs/ with --clean and --reset options, and updates .gitignore to track fixtures. --- .gitignore | 2 + .../documents/acceptance_criteria.json | 105 ++++ .../documents/design_brief.md | 78 +++ .../001-ai-prompt-playground/documents/prd.md | 82 +++ .../documents/screens.json | 483 ++++++++++++++++++ .../documents/technical_specification.md | 101 ++++ .../documents/technology_choices.json | 150 ++++++ .../issues/ENG-001.md | 47 ++ .../issues/ENG-002.md | 50 ++ .../issues/ENG-003.md | 50 ++ .../issues/ENG-004.md | 49 ++ .../issues/ENG-005.md | 48 ++ .../issues/ENG-006.md | 51 ++ .../issues/issues.json | 86 ++++ .../project_request.md | 16 + .../project_status.json | 86 ++++ .../questions/engineer_questions.json | 25 + .../questions/pm_questions.json | 25 + .../questions/ux_questions.json | 25 + .../documents/acceptance_criteria.json | 99 ++++ .../documents/design_brief.md | 73 +++ .../documents/prd.md | 70 +++ .../documents/screens.json | 326 ++++++++++++ .../documents/technical_specification.md | 73 +++ .../documents/technology_choices.json | 150 ++++++ .../project_request.md | 16 + .../project_status.json | 83 +++ .../questions/engineer_questions.json | 25 + .../questions/pm_questions.json | 25 + .../questions/ux_questions.json | 25 + .../documents/acceptance_criteria.json | 99 ++++ .../003-ai-test-generator/documents/prd.md | 72 +++ .../003-ai-test-generator/project_request.md | 16 + .../003-ai-test-generator/project_status.json | 73 +++ .../questions/engineer_questions.json | 4 + .../questions/pm_questions.json | 25 + .../questions/ux_questions.json | 4 + .../004-smart-log-analyzer/project_request.md | 16 + .../project_status.json | 68 +++ .../questions/engineer_questions.json | 4 + .../questions/pm_questions.json | 21 + .../questions/ux_questions.json | 4 + .../documents/acceptance_criteria.json | 44 ++ .../documents/design_brief.md | 61 +++ .../documents/prd.md | 72 +++ .../documents/screens.json | 269 ++++++++++ .../documents/technical_specification.md | 89 ++++ .../documents/technology_choices.json | 150 ++++++ .../issues/ENG-001.md | 48 ++ .../issues/ENG-002.md | 49 ++ .../issues/ENG-003.md | 49 ++ .../issues/ENG-004.md | 49 ++ .../issues/ENG-005.md | 51 ++ .../issues/issues.json | 73 +++ .../project_request.md | 16 + .../project_status.json | 86 ++++ .../questions/engineer_questions.json | 25 + .../questions/pm_questions.json | 25 + .../questions/ux_questions.json | 25 + scripts/load-demo.sh | 126 +++++ 60 files changed, 4237 insertions(+) create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/acceptance_criteria.json create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/design_brief.md create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/prd.md create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/screens.json create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/technical_specification.md create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/technology_choices.json create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-001.md create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-002.md create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-003.md create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-004.md create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-005.md create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-006.md create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/issues.json create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/project_request.md create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/project_status.json create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/questions/engineer_questions.json create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/questions/pm_questions.json create mode 100644 fixtures/demo/outputs/projects/001-ai-prompt-playground/questions/ux_questions.json create mode 100644 fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/acceptance_criteria.json create mode 100644 fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/design_brief.md create mode 100644 fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/prd.md create mode 100644 fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/screens.json create mode 100644 fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/technical_specification.md create mode 100644 fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/technology_choices.json create mode 100644 fixtures/demo/outputs/projects/002-context-aware-autocomplete/project_request.md create mode 100644 fixtures/demo/outputs/projects/002-context-aware-autocomplete/project_status.json create mode 100644 fixtures/demo/outputs/projects/002-context-aware-autocomplete/questions/engineer_questions.json create mode 100644 fixtures/demo/outputs/projects/002-context-aware-autocomplete/questions/pm_questions.json create mode 100644 fixtures/demo/outputs/projects/002-context-aware-autocomplete/questions/ux_questions.json create mode 100644 fixtures/demo/outputs/projects/003-ai-test-generator/documents/acceptance_criteria.json create mode 100644 fixtures/demo/outputs/projects/003-ai-test-generator/documents/prd.md create mode 100644 fixtures/demo/outputs/projects/003-ai-test-generator/project_request.md create mode 100644 fixtures/demo/outputs/projects/003-ai-test-generator/project_status.json create mode 100644 fixtures/demo/outputs/projects/003-ai-test-generator/questions/engineer_questions.json create mode 100644 fixtures/demo/outputs/projects/003-ai-test-generator/questions/pm_questions.json create mode 100644 fixtures/demo/outputs/projects/003-ai-test-generator/questions/ux_questions.json create mode 100644 fixtures/demo/outputs/projects/004-smart-log-analyzer/project_request.md create mode 100644 fixtures/demo/outputs/projects/004-smart-log-analyzer/project_status.json create mode 100644 fixtures/demo/outputs/projects/004-smart-log-analyzer/questions/engineer_questions.json create mode 100644 fixtures/demo/outputs/projects/004-smart-log-analyzer/questions/pm_questions.json create mode 100644 fixtures/demo/outputs/projects/004-smart-log-analyzer/questions/ux_questions.json create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/acceptance_criteria.json create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/design_brief.md create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/prd.md create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/screens.json create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/technical_specification.md create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/technology_choices.json create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-001.md create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-002.md create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-003.md create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-004.md create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-005.md create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/issues.json create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/project_request.md create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/project_status.json create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/questions/engineer_questions.json create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/questions/pm_questions.json create mode 100644 fixtures/demo/outputs/projects/005-model-performance-dashboard/questions/ux_questions.json create mode 100755 scripts/load-demo.sh diff --git a/.gitignore b/.gitignore index a2e618f..957bda6 100644 --- a/.gitignore +++ b/.gitignore @@ -4,6 +4,8 @@ node_modules/ # Output directories - all generated outputs should not be committed outputs/ specwright/outputs/ +# But keep demo fixtures tracked in git +!fixtures/demo/outputs/ # System files .DS_Store diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/acceptance_criteria.json b/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/acceptance_criteria.json new file mode 100644 index 0000000..9f032e2 --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/acceptance_criteria.json @@ -0,0 +1,105 @@ +{ + "project_id": "001-ai-prompt-playground", + "project_name": "AI Prompt Playground", + "job_stories": [ + { + "job_story_id": "js_001", + "title": "Write and Test a Prompt", + "situation": "I have a prompt idea and want to see how different LLMs respond", + "motivation": "quickly test and compare outputs without switching between provider playgrounds", + "outcome": "I can evaluate which provider gives the best result for my use case", + "acceptance_criteria": [ + { + "id": "ac_001_01", + "given": "I am on the Prompt Playground page", + "when": "I type a prompt with {{variable}} placeholders in the editor", + "then": "Variable inputs appear below the editor for each detected placeholder" + }, + { + "id": "ac_001_02", + "given": "I have written a prompt and selected Claude and GPT-4", + "when": "I click 'Run'", + "then": "Both providers stream responses simultaneously in side-by-side panels" + }, + { + "id": "ac_001_03", + "given": "Responses are streaming", + "when": "I view the response panels", + "then": "Each panel shows a live token counter and elapsed time" + }, + { + "id": "ac_001_04", + "given": "A provider returns an error", + "when": "The error is received", + "then": "The panel shows the error message with a 'Retry' button while other panels continue" + } + ] + }, + { + "job_story_id": "js_002", + "title": "Compare Responses Across Providers", + "situation": "multiple providers have returned responses to the same prompt", + "motivation": "evaluate which response is best for quality, accuracy, and cost", + "outcome": "I can make an informed decision about which provider to use", + "acceptance_criteria": [ + { + "id": "ac_002_01", + "given": "Responses from 2+ providers are displayed", + "when": "I view the comparison layout", + "then": "Each response shows token count, latency, and estimated cost" + }, + { + "id": "ac_002_02", + "given": "I am viewing a response", + "when": "I click the thumbs up or thumbs down button", + "then": "The rating is saved and visible in the version history for this run" + }, + { + "id": "ac_002_03", + "given": "I want to read one response in detail", + "when": "I click 'Expand' on a response panel", + "then": "The panel takes full width with the others collapsed to tabs" + }, + { + "id": "ac_002_04", + "given": "Responses contain markdown", + "when": "I view the response", + "then": "Markdown is rendered with proper headings, code blocks, and lists" + } + ] + }, + { + "job_story_id": "js_003", + "title": "Version and Iterate on Prompts", + "situation": "I have been iterating on a prompt across multiple test runs", + "motivation": "see what changes I made and how they affected response quality", + "outcome": "I can learn what prompt patterns work best and avoid regressions", + "acceptance_criteria": [ + { + "id": "ac_003_01", + "given": "I have run a prompt test", + "when": "The responses complete", + "then": "A new version is auto-saved with timestamp, prompt text, and quality scores" + }, + { + "id": "ac_003_02", + "given": "I am in the version history panel", + "when": "I select two versions", + "then": "A diff view shows additions in green and deletions in red between the two prompts" + }, + { + "id": "ac_003_03", + "given": "I am viewing a previous version", + "when": "I click 'Restore'", + "then": "The prompt editor loads that version's text as the active prompt" + }, + { + "id": "ac_003_04", + "given": "I have multiple versions with quality scores", + "when": "I view the version list", + "then": "A sparkline chart shows quality score trend across versions" + } + ] + } + ] +} diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/design_brief.md b/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/design_brief.md new file mode 100644 index 0000000..82d7296 --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/design_brief.md @@ -0,0 +1,78 @@ +# Design Brief: AI Prompt Playground + +## Design Goals + +1. **Editor-first** - The prompt editor is the hero; everything else supports it +2. **Comparison-friendly** - Make it effortless to see differences between providers and versions +3. **Fast iteration** - Minimize clicks between writing a prompt and seeing results + +## User Flows + +### Flow 1: Write and Run a Prompt + +``` +New Prompt → Editor appears + | +Type prompt with {{variables}} + - Variable inputs auto-appear below + - Provider checkboxes on the right + | +Click "Run" (or Cmd+Enter) + | +Responses stream side-by-side + - Token count + latency per provider + - Rate each response (thumbs up/down) +``` + +### Flow 2: Compare and Iterate + +``` +View responses from Run #1 + | +Edit prompt → Run again + | +Version history sidebar shows Run #1 and #2 + | +Select both → Diff view shows prompt changes + | +Quality trend sparkline shows improvement +``` + +### Flow 3: Browse and Use Templates + +``` +Click "Templates" in sidebar + | +Browse by category (Extraction, Classification, Generation, etc.) + | +Preview template with sample variables + | +Click "Use Template" → loads into editor + | +Customize and run +``` + +## Key Screens + +1. **Prompt Editor** - Split view: editor left, response panels right +2. **Version History** - Sidebar with version list, diff view on select +3. **Template Library** - Grid of template cards by category +4. **Settings** - Provider API keys, default model selection, preferences + +## Visual Guidelines + +- Clean, minimal interface — the content (prompts and responses) is the focus +- Monospace font in editor, proportional in responses (with code blocks monospace) +- Provider colors: Claude (orange), GPT-4 (green), Gemini (blue) +- Quality indicators: High (green), Medium (yellow), Low (red) +- Dark mode default with light mode option + +## Accessibility + +- Full keyboard navigation (Tab between panels, Cmd+Enter to run) +- Screen reader announces response completion and quality scores +- Sufficient contrast for diff highlighting (not just color-dependent) +- Focus management: after Run, focus moves to first response panel + +--- +*Document generated as part of SpecWright specification* diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/prd.md b/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/prd.md new file mode 100644 index 0000000..6f91541 --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/prd.md @@ -0,0 +1,82 @@ +# Product Requirements Document: AI Prompt Playground + +## Overview + +An interactive workspace for prompt engineering that lets developers write, test, and refine AI prompts across multiple LLM providers. Compare responses side-by-side, track prompt iterations with version history, score response quality, and build a reusable template library. + +## Problem Statement + +Prompt engineering is iterative and messy. Developers bounce between provider playgrounds, lose track of what they tested, and have no way to compare outputs systematically. There is no single tool that combines writing, testing, versioning, and quality tracking in one place. + +## Goals + +1. **Multi-provider testing** - Run the same prompt against Claude, GPT-4, and Gemini in one click +2. **Version control** - Track prompt iterations with diffs so you can see what changed and why +3. **Quality measurement** - Automated and manual scoring to track improvement over time +4. **Reusable templates** - Build a library of proven prompt patterns for the team + +## User Stories + +### US-1: Write and Test a Prompt +**As a** developer, **I want to** write a prompt and test it against multiple LLMs simultaneously, **so that** I can compare outputs and pick the best provider for my use case. + +**Acceptance Criteria:** +- Rich text editor for prompt with variable placeholder support ({{variable}}) +- Select one or more providers to test against +- Run all selected providers in parallel +- Streaming responses displayed in real-time + +### US-2: Compare Responses Side-by-Side +**As a** developer, **I want to** see responses from different providers next to each other, **so that** I can evaluate quality, tone, and accuracy differences. + +**Acceptance Criteria:** +- Side-by-side columns for each provider response +- Token count and latency shown per response +- Thumbs up/down rating per response +- Expand any single response to full width for detailed reading + +### US-3: Version and Iterate on Prompts +**As a** developer, **I want to** save prompt versions and see diffs between iterations, **so that** I can track what changes improved or degraded quality. + +**Acceptance Criteria:** +- Auto-save creates a new version on each test run +- Version list shows timestamp, provider tested, quality score +- Diff view highlights additions/deletions between any two versions +- Can restore any previous version as the active prompt + +### US-4: Use and Create Templates +**As a** developer, **I want to** start from proven prompt templates, **so that** I don't reinvent common patterns. + +**Acceptance Criteria:** +- Browse templates by category (classification, extraction, generation, etc.) +- Preview template with example variables filled in +- One-click to load template into editor +- Save any prompt as a new template + +## Scope + +### In Scope +- Multi-provider prompt testing (Claude, GPT-4, Gemini) +- Streaming responses with real-time display +- Prompt version history with diff comparison +- Automated + manual quality scoring +- Template library with categories and search +- Export/import prompts as JSON + +### Out of Scope +- Team collaboration (v2) +- CI/CD integration for prompt regression testing (v2) +- Prompt chaining / multi-step workflows +- Fine-tuning integration + +## Success Metrics + +| Metric | Target | +|--------|--------| +| Prompts tested per session | avg 5+ | +| Version comparison usage | >60% of users | +| Template adoption rate | >40% start from template | +| Time to first test | <30 seconds | + +--- +*Document generated as part of SpecWright specification* diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/screens.json b/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/screens.json new file mode 100644 index 0000000..cf2c10c --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/screens.json @@ -0,0 +1,483 @@ +{ + "project_id": "001-ai-prompt-playground", + "project_name": "AI Prompt Playground", + "screens": [ + { + "id": "prompt-editor-main", + "name": "Prompt Editor with Responses", + "route": "/playground", + "description": "Main workspace with prompt editor on the left and multi-provider response panels on the right", + "wireframe": { + "type": "stack", + "direction": "vertical", + "children": [ + { + "type": "nav", + "text": "NovaMind Prompt Playground", + "children": [ + { "type": "tabs", "tabs": ["Editor", "Templates", "History"], "activeTab": "Editor" }, + { "type": "spacer", "flex": 1 }, + { "type": "button", "text": "Export", "variant": "ghost" }, + { "type": "avatar", "size": "sm", "text": "JB" } + ] + }, + { + "type": "row", + "gap": "none", + "children": [ + { + "type": "stack", + "direction": "vertical", + "padding": "md", + "gap": "sm", + "children": [ + { "type": "heading", "text": "Prompt", "size": "sm" }, + { + "type": "card", + "padding": "md", + "children": [ + { "type": "text", "text": "You are a senior code reviewer. Analyze the following {{language}} code and identify:\n1. Security vulnerabilities\n2. Performance issues\n3. Best practice violations\n\nCode:\n```\n{{code}}\n```\n\nProvide your analysis as a JSON array.", "size": "xs" } + ] + }, + { + "type": "card", + "padding": "sm", + "children": [ + { + "type": "stack", + "gap": "sm", + "children": [ + { "type": "label", "text": "Variables" }, + { + "type": "row", + "gap": "sm", + "children": [ + { "type": "input", "placeholder": "language", "value": "TypeScript" }, + { "type": "input", "placeholder": "code", "value": "const data = eval(input)" } + ] + } + ] + } + ] + }, + { + "type": "row", + "gap": "sm", + "children": [ + { "type": "button", "text": "Run (Cmd+Enter)", "variant": "primary" }, + { "type": "badge", "text": "Claude", "variant": "default" }, + { "type": "badge", "text": "GPT-4", "variant": "default" } + ] + } + ] + }, + { "type": "divider", "direction": "vertical" }, + { + "type": "stack", + "direction": "vertical", + "padding": "md", + "gap": "md", + "children": [ + { + "type": "row", + "justify": "between", + "children": [ + { "type": "heading", "text": "Responses", "size": "sm" }, + { "type": "text", "text": "Run #4 — 2.3s ago", "size": "xs", "color": "muted" } + ] + }, + { + "type": "row", + "gap": "md", + "children": [ + { + "type": "card", + "padding": "md", + "children": [ + { + "type": "stack", + "gap": "sm", + "children": [ + { + "type": "row", + "justify": "between", + "children": [ + { "type": "badge", "text": "Claude", "variant": "default" }, + { "type": "text", "text": "1.8s | 342 tokens", "size": "xs", "color": "muted" } + ] + }, + { "type": "text", "text": "[{\"type\": \"security\", \"severity\": \"critical\", \"line\": 1, \"message\": \"eval() is dangerous...\"}]", "size": "xs" }, + { + "type": "row", + "gap": "sm", + "children": [ + { "type": "button", "text": "Expand", "variant": "ghost" }, + { "type": "button", "text": "Copy", "variant": "ghost" } + ] + } + ] + } + ] + }, + { + "type": "card", + "padding": "md", + "children": [ + { + "type": "stack", + "gap": "sm", + "children": [ + { + "type": "row", + "justify": "between", + "children": [ + { "type": "badge", "text": "GPT-4", "variant": "default" }, + { "type": "text", "text": "2.1s | 410 tokens", "size": "xs", "color": "muted" } + ] + }, + { "type": "text", "text": "[{\"vulnerability\": \"Code Injection\", \"risk_level\": \"HIGH\", \"description\": \"The use of eval()...\"}]", "size": "xs" }, + { + "type": "row", + "gap": "sm", + "children": [ + { "type": "button", "text": "Expand", "variant": "ghost" }, + { "type": "button", "text": "Copy", "variant": "ghost" } + ] + } + ] + } + ] + } + ] + } + ] + } + ] + } + ] + }, + "notes": "Editor should support prompt syntax highlighting for {{variables}}. Responses stream in real-time. Provider badges use brand colors.", + "components_to_reuse": [ + { "name": "Button", "path": "components/ui/Button.tsx" }, + { "name": "Card", "path": "components/ui/Card.tsx" }, + { "name": "Badge", "path": "components/ui/Badge.tsx" }, + { "name": "Input", "path": "components/ui/Input.tsx" } + ], + "components_to_create": [ + { "name": "PromptEditor", "path": "components/playground/PromptEditor.tsx", "description": "Rich text editor with variable placeholder detection and syntax highlighting" }, + { "name": "ResponsePanel", "path": "components/playground/ResponsePanel.tsx", "description": "Streaming response display with metadata (tokens, latency, cost)" }, + { "name": "VariableInputs", "path": "components/playground/VariableInputs.tsx", "description": "Auto-generated input fields for detected {{variable}} placeholders" }, + { "name": "ProviderSelector", "path": "components/playground/ProviderSelector.tsx", "description": "Checkboxes for selecting which LLM providers to test against" } + ] + }, + { + "id": "version-history", + "name": "Version History with Diff", + "route": "/playground/history", + "description": "Version list showing prompt iterations with diff comparison between selected versions", + "wireframe": { + "type": "stack", + "direction": "vertical", + "children": [ + { + "type": "nav", + "text": "NovaMind Prompt Playground", + "children": [ + { "type": "tabs", "tabs": ["Editor", "Templates", "History"], "activeTab": "History" }, + { "type": "spacer", "flex": 1 }, + { "type": "avatar", "size": "sm", "text": "JB" } + ] + }, + { + "type": "row", + "gap": "md", + "padding": "md", + "children": [ + { + "type": "stack", + "gap": "sm", + "children": [ + { "type": "heading", "text": "Versions", "size": "sm" }, + { + "type": "stack", + "gap": "xs", + "children": [ + { + "type": "card", + "padding": "sm", + "children": [ + { + "type": "row", + "justify": "between", + "children": [ + { "type": "text", "text": "v4 (current)", "size": "sm" }, + { "type": "badge", "text": "8.5/10", "variant": "default" } + ] + }, + { "type": "text", "text": "Today, 3:42 PM", "size": "xs", "color": "muted" } + ] + }, + { + "type": "card", + "padding": "sm", + "children": [ + { + "type": "row", + "justify": "between", + "children": [ + { "type": "text", "text": "v3", "size": "sm" }, + { "type": "badge", "text": "7.2/10", "variant": "warning" } + ] + }, + { "type": "text", "text": "Today, 3:15 PM", "size": "xs", "color": "muted" } + ] + }, + { + "type": "card", + "padding": "sm", + "children": [ + { + "type": "row", + "justify": "between", + "children": [ + { "type": "text", "text": "v2", "size": "sm" }, + { "type": "badge", "text": "6.0/10", "variant": "warning" } + ] + }, + { "type": "text", "text": "Today, 2:50 PM", "size": "xs", "color": "muted" } + ] + }, + { + "type": "card", + "padding": "sm", + "children": [ + { + "type": "row", + "justify": "between", + "children": [ + { "type": "text", "text": "v1", "size": "sm" }, + { "type": "badge", "text": "4.1/10", "variant": "destructive" } + ] + }, + { "type": "text", "text": "Today, 2:30 PM", "size": "xs", "color": "muted" } + ] + } + ] + } + ] + }, + { + "type": "stack", + "gap": "sm", + "children": [ + { + "type": "row", + "justify": "between", + "children": [ + { "type": "heading", "text": "Diff: v3 → v4", "size": "sm" }, + { "type": "button", "text": "Restore v3", "variant": "outline" } + ] + }, + { + "type": "card", + "padding": "md", + "children": [ + { + "type": "stack", + "gap": "xs", + "children": [ + { "type": "text", "text": " You are a senior code reviewer.", "size": "xs" }, + { "type": "text", "text": "- Analyze the following code and identify issues.", "size": "xs", "color": "error" }, + { "type": "text", "text": "+ Analyze the following {{language}} code and identify:", "size": "xs", "color": "accent" }, + { "type": "text", "text": "+ 1. Security vulnerabilities", "size": "xs", "color": "accent" }, + { "type": "text", "text": "+ 2. Performance issues", "size": "xs", "color": "accent" }, + { "type": "text", "text": "+ 3. Best practice violations", "size": "xs", "color": "accent" }, + { "type": "text", "text": " ", "size": "xs" }, + { "type": "text", "text": "- Provide your analysis.", "size": "xs", "color": "error" }, + { "type": "text", "text": "+ Provide your analysis as a JSON array.", "size": "xs", "color": "accent" } + ] + } + ] + } + ] + } + ] + } + ] + }, + "notes": "Click two versions to compare. Quality score badge uses green/yellow/red based on score. Sparkline trend chart above version list.", + "components_to_reuse": [ + { "name": "Card", "path": "components/ui/Card.tsx" }, + { "name": "Badge", "path": "components/ui/Badge.tsx" }, + { "name": "Button", "path": "components/ui/Button.tsx" } + ], + "components_to_create": [ + { "name": "VersionList", "path": "components/playground/VersionList.tsx", "description": "Scrollable list of prompt versions with scores and timestamps" }, + { "name": "PromptDiff", "path": "components/playground/PromptDiff.tsx", "description": "Side-by-side or inline diff view for prompt text comparison" }, + { "name": "QualitySparkline", "path": "components/playground/QualitySparkline.tsx", "description": "Small trend chart showing quality score across versions" } + ] + }, + { + "id": "template-library", + "name": "Template Library", + "route": "/playground/templates", + "description": "Browsable gallery of prompt templates organized by category", + "wireframe": { + "type": "stack", + "direction": "vertical", + "children": [ + { + "type": "nav", + "text": "NovaMind Prompt Playground", + "children": [ + { "type": "tabs", "tabs": ["Editor", "Templates", "History"], "activeTab": "Templates" }, + { "type": "spacer", "flex": 1 }, + { "type": "avatar", "size": "sm", "text": "JB" } + ] + }, + { + "type": "stack", + "padding": "lg", + "gap": "lg", + "children": [ + { + "type": "row", + "justify": "between", + "children": [ + { "type": "heading", "text": "Prompt Templates", "size": "lg" }, + { "type": "input", "placeholder": "Search templates...", "icon": "search" } + ] + }, + { + "type": "row", + "gap": "sm", + "children": [ + { "type": "button", "text": "All", "variant": "secondary" }, + { "type": "button", "text": "Extraction", "variant": "ghost" }, + { "type": "button", "text": "Classification", "variant": "ghost" }, + { "type": "button", "text": "Generation", "variant": "ghost" }, + { "type": "button", "text": "Analysis", "variant": "ghost" }, + { "type": "button", "text": "Transformation", "variant": "ghost" } + ] + }, + { + "type": "row", + "gap": "md", + "children": [ + { + "type": "card", + "padding": "md", + "children": [ + { + "type": "stack", + "gap": "sm", + "children": [ + { "type": "badge", "text": "Featured", "variant": "default" }, + { "type": "heading", "text": "JSON Data Extractor", "size": "sm" }, + { "type": "text", "text": "Extract structured JSON data from unstructured text. Handles nested objects and arrays.", "size": "xs", "color": "muted" }, + { + "type": "row", + "gap": "xs", + "children": [ + { "type": "badge", "text": "Extraction" }, + { "type": "badge", "text": "2 variables" } + ] + }, + { "type": "button", "text": "Use Template", "variant": "primary", "fullWidth": true } + ] + } + ] + }, + { + "type": "card", + "padding": "md", + "children": [ + { + "type": "stack", + "gap": "sm", + "children": [ + { "type": "badge", "text": "Popular", "variant": "default" }, + { "type": "heading", "text": "Code Review Prompt", "size": "sm" }, + { "type": "text", "text": "Analyze code for security issues, bugs, performance problems, and style violations.", "size": "xs", "color": "muted" }, + { + "type": "row", + "gap": "xs", + "children": [ + { "type": "badge", "text": "Analysis" }, + { "type": "badge", "text": "3 variables" } + ] + }, + { "type": "button", "text": "Use Template", "variant": "primary", "fullWidth": true } + ] + } + ] + }, + { + "type": "card", + "padding": "md", + "children": [ + { + "type": "stack", + "gap": "sm", + "children": [ + { "type": "spacer", "size": "sm" }, + { "type": "heading", "text": "Text Classifier", "size": "sm" }, + { "type": "text", "text": "Classify text into predefined categories with confidence scores.", "size": "xs", "color": "muted" }, + { + "type": "row", + "gap": "xs", + "children": [ + { "type": "badge", "text": "Classification" }, + { "type": "badge", "text": "3 variables" } + ] + }, + { "type": "button", "text": "Use Template", "variant": "primary", "fullWidth": true } + ] + } + ] + } + ] + } + ] + } + ] + }, + "notes": "Templates should show a preview modal before loading. Featured templates appear first. User-created templates show in a 'My Templates' section.", + "components_to_reuse": [ + { "name": "Card", "path": "components/ui/Card.tsx" }, + { "name": "Badge", "path": "components/ui/Badge.tsx" }, + { "name": "Button", "path": "components/ui/Button.tsx" }, + { "name": "Input", "path": "components/ui/Input.tsx" } + ], + "components_to_create": [ + { "name": "TemplateCard", "path": "components/playground/TemplateCard.tsx", "description": "Card displaying template name, description, category, and variable count" }, + { "name": "TemplatePreview", "path": "components/playground/TemplatePreview.tsx", "description": "Modal showing full template text with sample variables filled in" }, + { "name": "CategoryFilter", "path": "components/playground/CategoryFilter.tsx", "description": "Horizontal filter bar for template categories" } + ] + } + ], + "user_flows": [ + { + "id": "flow_write_and_test", + "name": "Write and Test a Prompt", + "steps": [ + { "step": 1, "screen": "prompt-editor-main", "action": "Developer types a prompt with {{language}} and {{code}} variables" }, + { "step": 2, "screen": "prompt-editor-main", "action": "Variable inputs appear; developer fills in 'TypeScript' and sample code" }, + { "step": 3, "screen": "prompt-editor-main", "action": "Developer checks Claude and GPT-4, clicks 'Run'" }, + { "step": 4, "screen": "prompt-editor-main", "action": "Responses stream side-by-side with token counts and latency" }, + { "step": 5, "screen": "prompt-editor-main", "action": "Developer rates Claude's response thumbs-up, GPT-4 thumbs-down" } + ] + }, + { + "id": "flow_iterate_with_versions", + "name": "Iterate and Compare Versions", + "steps": [ + { "step": 1, "screen": "prompt-editor-main", "action": "Developer edits the prompt to add structured output instructions" }, + { "step": 2, "screen": "prompt-editor-main", "action": "Runs again — new version auto-saved" }, + { "step": 3, "screen": "version-history", "action": "Opens History tab, sees v1 through v4 with quality scores" }, + { "step": 4, "screen": "version-history", "action": "Selects v3 and v4 to see the diff" }, + { "step": 5, "screen": "version-history", "action": "Notices v3 had better structure, clicks 'Restore v3'" } + ] + } + ] +} diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/technical_specification.md b/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/technical_specification.md new file mode 100644 index 0000000..38b01f7 --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/technical_specification.md @@ -0,0 +1,101 @@ +# Technical Specification: AI Prompt Playground + +## Architecture Overview + +The playground uses a React SPA frontend with an Express API backend. Provider calls are made server-side to protect API keys. Responses stream via Server-Sent Events (SSE). + +``` +┌──────────────┐ ┌──────────────┐ ┌──────────────┐ +│ React SPA │────▶│ Express API │────▶│ Claude API │ +│ (Vite) │◀SSE─│ (Node.js) │────▶│ OpenAI API │ +└──────────────┘ └──────┬───────┘────▶│ Gemini API │ + │ └──────────────┘ + ┌──────▼───────┐ + │ PostgreSQL │ + │ (versions) │ + └──────────────┘ +``` + +## Provider Abstraction + +Each provider implements the `LLMProvider` interface: + +```typescript +interface LLMProvider { + name: string; + streamCompletion(prompt: string, options: ProviderOptions): AsyncIterable; + countTokens(text: string): number; + estimateCost(inputTokens: number, outputTokens: number): number; +} +``` + +Providers are registered in a `ProviderRegistry` and called in parallel during a test run. + +## API Endpoints + +### POST /api/playground/run +Execute a prompt against selected providers. Returns SSE stream. + +**Request:** +```json +{ + "prompt": "You are a...", + "variables": { "language": "TypeScript", "code": "..." }, + "providers": ["claude", "gpt4"] +} +``` + +**SSE Events:** +- `provider:start` — Provider begins processing +- `provider:chunk` — Token chunk from provider +- `provider:complete` — Provider finished, includes metadata +- `provider:error` — Provider failed + +### GET /api/playground/versions/:promptId +List all versions of a prompt. + +### GET /api/playground/diff/:versionA/:versionB +Compute diff between two prompt versions. + +### POST /api/playground/rate +Submit quality rating for a response. + +### GET /api/playground/templates +List available templates with optional category filter. + +## Database Schema + +### Table: `prompts` +Stores prompt definitions with metadata. + +### Table: `prompt_versions` +Stores each version of a prompt with full text and computed quality score. + +### Table: `test_runs` +Stores results from each run including provider responses, tokens, latency. + +### Table: `templates` +Stores reusable prompt templates with categories and variable schemas. + +## Quality Scoring + +Automated scoring checks: +- **Format compliance** — Does the response match requested format (JSON, list, etc.)? +- **Length appropriateness** — Is the response within expected length bounds? +- **Variable coverage** — Does the response reference all provided context? + +Manual scoring: thumbs up/down per response, aggregated as acceptance rate. + +Final score: weighted average of automated (60%) and manual (40%) scores, 0-10 scale. + +## Performance Targets + +| Operation | Target | +|-----------|--------| +| Stream first token | <500ms | +| Full response (avg) | <5s | +| Version diff computation | <100ms | +| Template search | <200ms | + +--- +*Document generated as part of SpecWright specification* diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/technology_choices.json b/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/technology_choices.json new file mode 100644 index 0000000..19199c8 --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/documents/technology_choices.json @@ -0,0 +1,150 @@ +{ + "project_name": "AI Prompt Playground", + "technology_decisions": [ + { + "category": "LLM SDK Abstraction", + "description": "How to handle API calls to multiple LLM providers (Claude, GPT-4, Gemini)", + "decision_needed": true, + "options": [ + { + "name": "Custom Adapter Pattern", + "description": "Build a lightweight provider interface with adapter implementations per provider", + "version": "N/A", + "documentation_url": "", + "github_url": "", + "pros": ["Full control over streaming behavior", "No external dependency", "Easy to add new providers", "Custom error handling per provider"], + "cons": ["More code to maintain", "Need to handle SDK updates ourselves"], + "trade_offs": ["More control but more maintenance"], + "maturity": "N/A", + "community_size": "N/A", + "last_updated": "N/A", + "implementation_complexity": "Medium", + "estimated_time": "6 hours", + "recommended": true, + "recommendation_reason": "Full control over streaming and error handling, critical for a playground tool" + }, + { + "name": "Vercel AI SDK", + "description": "Vercel's unified AI SDK that abstracts multiple providers", + "version": "3.0", + "documentation_url": "https://sdk.vercel.ai/docs", + "github_url": "https://github.com/vercel/ai", + "pros": ["Multi-provider out of the box", "Good streaming support", "Active community"], + "cons": ["Opinionated API may not fit our needs", "Extra dependency", "Less control over provider-specific features"], + "trade_offs": ["Faster setup but less flexibility"], + "maturity": "Production-ready", + "community_size": "Large", + "last_updated": "2024-12", + "implementation_complexity": "Low", + "estimated_time": "3 hours", + "recommended": false, + "recommendation_reason": "" + } + ], + "user_choice": "Custom Adapter Pattern", + "user_choice_description": "Build a lightweight provider interface with adapter implementations per provider", + "user_choice_version": "N/A", + "user_reason": "For a playground tool, we need precise control over streaming, token counting, and error handling per provider", + "final_decision": "custom-adapter-pattern" + }, + { + "category": "Prompt Editor", + "description": "Rich text editor component for writing prompts with variable highlighting", + "decision_needed": true, + "options": [ + { + "name": "CodeMirror 6", + "description": "Modern, modular code editor with excellent extension support", + "version": "6.0", + "documentation_url": "https://codemirror.net/", + "github_url": "https://github.com/codemirror/dev", + "pros": ["Lightweight and modular", "Custom language support easy to add", "Great for non-code text too", "Good mobile support"], + "cons": ["Steeper learning curve for extensions", "Less visual polish than Monaco"], + "trade_offs": ["More flexible but requires more setup for features"], + "maturity": "Production-ready", + "community_size": "Large", + "last_updated": "2024-11", + "implementation_complexity": "Medium", + "estimated_time": "5 hours", + "recommended": true, + "recommendation_reason": "Lightweight, modular, and easy to add custom syntax for {{variable}} highlighting" + }, + { + "name": "Monaco Editor", + "description": "VS Code's editor component", + "version": "0.45.0", + "documentation_url": "https://microsoft.github.io/monaco-editor/", + "github_url": "https://github.com/microsoft/monaco-editor", + "pros": ["Feature-rich out of the box", "Familiar to developers", "Excellent autocomplete"], + "cons": ["Heavy bundle (~2MB)", "Overkill for prompt editing", "Hard to customize for non-code content"], + "trade_offs": ["More features but much heavier"], + "maturity": "Production-ready", + "community_size": "Very large", + "last_updated": "2024-10", + "implementation_complexity": "Medium", + "estimated_time": "4 hours", + "recommended": false, + "recommendation_reason": "" + } + ], + "user_choice": "CodeMirror 6", + "user_choice_description": "Modern, modular code editor with excellent extension support", + "user_choice_version": "6.0", + "user_reason": "Prompts are text, not code — CodeMirror's flexibility is better suited than Monaco's code-centric features", + "final_decision": "codemirror-6" + }, + { + "category": "Diff Library", + "description": "Library for computing and displaying text diffs between prompt versions", + "decision_needed": true, + "options": [ + { + "name": "diff (npm)", + "description": "Mature text diffing library with multiple algorithms", + "version": "5.2", + "documentation_url": "https://github.com/kpdecker/jsdiff", + "github_url": "https://github.com/kpdecker/jsdiff", + "pros": ["Tiny bundle (8KB)", "Multiple diff modes (chars, words, lines)", "No DOM dependency", "Well-tested"], + "cons": ["Diff computation only, no rendering", "Need to build UI ourselves"], + "trade_offs": ["Small and focused but requires custom renderer"], + "maturity": "Production-ready", + "community_size": "Large", + "last_updated": "2024-09", + "implementation_complexity": "Low", + "estimated_time": "3 hours", + "recommended": true, + "recommendation_reason": "Lightweight, flexible, and we want custom diff rendering anyway" + }, + { + "name": "react-diff-viewer-continued", + "description": "React component for displaying side-by-side or unified diffs", + "version": "3.3.0", + "documentation_url": "https://github.com/aeolun/react-diff-viewer-continued", + "github_url": "https://github.com/aeolun/react-diff-viewer-continued", + "pros": ["Ready-made React component", "Split and unified views", "Syntax highlighting"], + "cons": ["Designed for code, not prose", "Less customizable rendering", "Extra dependency"], + "trade_offs": ["Faster to implement but harder to customize for prompt text"], + "maturity": "Stable", + "community_size": "Medium", + "last_updated": "2024-08", + "implementation_complexity": "Low", + "estimated_time": "2 hours", + "recommended": false, + "recommendation_reason": "" + } + ], + "user_choice": "diff (npm)", + "user_choice_description": "Mature text diffing library with multiple algorithms", + "user_choice_version": "5.2", + "user_reason": "We want word-level diffs for prompt text, not line-level code diffs", + "final_decision": "jsdiff" + } + ], + "summary": { + "total_decisions": 3, + "decisions_made": 3, + "estimated_setup_time": "14 hours", + "estimated_learning_curve": "Low to Medium — CodeMirror extensions require some ramp-up", + "overall_complexity": "Medium" + } +} diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-001.md b/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-001.md new file mode 100644 index 0000000..282d136 --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-001.md @@ -0,0 +1,47 @@ +# Set up prompt editor with variable detection + +**Issue ID**: ENG-001 +**Status**: approved +**Category**: frontend +**Complexity Score**: 5/10 +**Estimated Hours**: 5h +**Complexity Reasoning**: CodeMirror 6 setup is well-documented, but custom syntax highlighting for {{variable}} placeholders requires writing a custom language extension. + +## Description + +Integrate CodeMirror 6 as the prompt editor with custom syntax highlighting for {{variable}} placeholders. When variables are detected, auto-generate input fields below the editor for each unique variable. + +## Technical Details + +- CodeMirror 6 with custom language extension for prompt syntax +- Regex-based {{variable}} detection that updates on each keystroke +- Dynamic input field generation for each unique variable +- Variable substitution in the final prompt before sending to providers +- Support for default values: {{variable:default}} + +## Dependencies + +- CodeMirror 6 packages: @codemirror/state, @codemirror/view, @codemirror/lang-markdown + +## Acceptance Criteria + +- [ ] CodeMirror editor renders with proper line numbers and soft wrapping +- [ ] {{variable}} placeholders highlighted with distinct color +- [ ] Input fields auto-appear below editor for each detected variable +- [ ] Removing a variable from the prompt removes its input field +- [ ] Variable values substituted correctly in the compiled prompt + +## Test Strategy + +### Automated Tests + +- Unit test: variable regex extraction from prompt text +- Unit test: prompt compilation with variable substitution +- Component test: editor renders, variables detected, inputs generated + +### Manual Verification (Human-in-the-Loop) + +1. Type a prompt with {{name}} and {{topic}} variables +2. Verify two input fields appear below the editor +3. Fill in the inputs and verify the compiled prompt updates +4. Remove {{topic}} from the prompt and verify the input field disappears diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-002.md b/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-002.md new file mode 100644 index 0000000..24008b4 --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-002.md @@ -0,0 +1,50 @@ +# Implement multi-provider LLM adapter layer + +**Issue ID**: ENG-002 +**Status**: approved +**Category**: backend +**Complexity Score**: 6/10 +**Estimated Hours**: 6h +**Complexity Reasoning**: Each provider has a different SDK and streaming API. Parallel execution with proper error handling adds complexity. + +## Description + +Build a provider abstraction layer with adapters for Claude, GPT-4, and Gemini. Execute prompts against multiple providers in parallel and stream responses via SSE to the frontend. + +## Technical Details + +- LLMProvider interface: streamCompletion(), countTokens(), estimateCost() +- Adapter implementations: ClaudeAdapter, OpenAIAdapter, GeminiAdapter +- ProviderRegistry for managing available providers +- POST /api/playground/run endpoint with SSE response +- Parallel execution: Promise.allSettled for multi-provider runs +- Per-provider error handling (don't fail all if one provider errors) + +## Dependencies + +- @anthropic-ai/sdk, openai, @google/generative-ai npm packages +- API keys configured in environment variables + +## Acceptance Criteria + +- [ ] Provider interface defined with streaming, token counting, and cost estimation +- [ ] Claude adapter implements streaming with proper token counting +- [ ] OpenAI adapter implements streaming with proper token counting +- [ ] Gemini adapter implements streaming with proper token counting +- [ ] /api/playground/run streams SSE events from all selected providers +- [ ] If one provider fails, others continue streaming +- [ ] Cost estimate shown per response based on actual token usage + +## Test Strategy + +### Automated Tests + +- Unit test: each adapter with mocked SDK responses +- Integration test: SSE endpoint streams correct event types +- Unit test: token counting matches expected values + +### Manual Verification (Human-in-the-Loop) + +1. Run a prompt against Claude only — verify streaming works +2. Run against Claude + GPT-4 — verify parallel streaming +3. Use an invalid API key for one provider — verify the other still works diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-003.md b/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-003.md new file mode 100644 index 0000000..9a7d5ee --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-003.md @@ -0,0 +1,50 @@ +# Build side-by-side response comparison + +**Issue ID**: ENG-003 +**Status**: in-review +**Category**: fullstack +**Complexity Score**: 5/10 +**Estimated Hours**: 4h +**Complexity Reasoning**: UI layout with streaming updates. Need to handle expand/collapse and responsive behavior. + +## Description + +Build the response comparison panel that shows streaming outputs from multiple providers side-by-side with metadata (token count, latency, cost) and rating controls. + +## Technical Details + +- ResponsePanel component consuming SSE events +- Side-by-side layout for 2+ providers, single column on mobile +- Live token counter and elapsed time during streaming +- Expand/collapse: click expands one panel to full width, collapses others to tabs +- Thumbs up/down rating per response, stored with the test run +- Markdown rendering for response content + +## Dependencies + +- ENG-001 (prompt editor for input) +- ENG-002 (provider layer for streaming data) + +## Acceptance Criteria + +- [ ] Response panels display side-by-side for each provider +- [ ] Token count and latency update in real-time during streaming +- [ ] Expand button makes one panel full-width +- [ ] Thumbs up/down saves rating for the response +- [ ] Markdown content renders with headings, code blocks, and lists +- [ ] Error state shows error message with retry button + +## Test Strategy + +### Automated Tests + +- Component test: panels render from mock SSE events +- Component test: expand/collapse toggles layout +- Component test: rating buttons call API + +### Manual Verification (Human-in-the-Loop) + +1. Run a prompt against Claude and GPT-4 +2. Watch responses stream in side-by-side +3. Click expand on Claude's response — verify GPT-4 collapses to tab +4. Rate both responses and verify ratings persist diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-004.md b/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-004.md new file mode 100644 index 0000000..0e81aac --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-004.md @@ -0,0 +1,49 @@ +# Add prompt version history with diffs + +**Issue ID**: ENG-004 +**Status**: pending +**Category**: fullstack +**Complexity Score**: 5/10 +**Estimated Hours**: 5h +**Complexity Reasoning**: Version storage is straightforward. Diff computation using jsdiff is well-documented. Main complexity is the diff rendering UI. + +## Description + +Auto-save a new version of the prompt on each test run. Build a version history panel with timestamps and quality scores. Allow comparing any two versions with a word-level diff view. + +## Technical Details + +- prompt_versions table: id, prompt_id, text, quality_score, created_at +- Auto-save triggered after each successful run +- Version list component with timestamp and score badge +- Diff computation using jsdiff (word-level) +- Custom diff renderer: green for additions, red for deletions +- Restore button loads a previous version into the editor + +## Dependencies + +- ENG-001 (prompt editor to load restored versions) + +## Acceptance Criteria + +- [ ] New version saved automatically after each test run +- [ ] Version list shows all versions with timestamps and quality scores +- [ ] Selecting two versions shows word-level diff +- [ ] Diff highlights additions in green and deletions in red +- [ ] Restore button loads selected version into the editor +- [ ] Quality sparkline shows score trend across versions + +## Test Strategy + +### Automated Tests + +- Unit test: jsdiff word-level diff produces correct output +- Integration test: version CRUD operations +- Component test: version list renders, diff view shows changes + +### Manual Verification (Human-in-the-Loop) + +1. Run a prompt 3 times with small edits each time +2. Open version history — verify 3 versions listed +3. Select v1 and v3 — verify diff shows all changes +4. Click Restore on v2 — verify editor loads v2 text diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-005.md b/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-005.md new file mode 100644 index 0000000..d3be9ae --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-005.md @@ -0,0 +1,48 @@ +# Implement response quality scoring + +**Issue ID**: ENG-005 +**Status**: pending +**Category**: fullstack +**Complexity Score**: 4/10 +**Estimated Hours**: 3h +**Complexity Reasoning**: Automated scoring heuristics are relatively simple. The sparkline chart adds minor complexity. + +## Description + +Implement automated quality scoring for responses based on format compliance, length appropriateness, and variable coverage. Combine with manual thumbs up/down ratings. Display quality trend as a sparkline across versions. + +## Technical Details + +- Scoring algorithm: format compliance (40%), length (30%), coverage (30%) +- Format check: does response match requested format (JSON, list, code block)? +- Length check: is response within expected bounds for the prompt type? +- Coverage check: does response reference all provided variables/context? +- Manual rating: thumbs up (+1), thumbs down (-1), neutral (0) +- Final score: weighted average of auto (60%) and manual (40%), 0-10 scale +- Sparkline component using SVG path for trend visualization + +## Dependencies + +- ENG-003 (response panel for rating buttons) + +## Acceptance Criteria + +- [ ] Auto-score computed for each response after completion +- [ ] Score displayed as X/10 badge on each response +- [ ] Thumbs up/down buttons update the combined score +- [ ] Sparkline chart shows quality trend in version history +- [ ] Scoring breakdown visible on hover/click + +## Test Strategy + +### Automated Tests + +- Unit test: scoring algorithm with known inputs and expected outputs +- Unit test: format detection regex for JSON, lists, code blocks +- Component test: sparkline renders from array of scores + +### Manual Verification (Human-in-the-Loop) + +1. Run a prompt that requests JSON output — verify format compliance score +2. Rate a response thumbs up — verify combined score increases +3. Open version history — verify sparkline shows score progression diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-006.md b/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-006.md new file mode 100644 index 0000000..f2ea873 --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/ENG-006.md @@ -0,0 +1,51 @@ +# Create prompt template library + +**Issue ID**: ENG-006 +**Status**: pending +**Category**: fullstack +**Complexity Score**: 4/10 +**Estimated Hours**: 3h +**Complexity Reasoning**: Standard CRUD with category filtering. Template preview modal is straightforward. + +## Description + +Build a template library where users can browse proven prompt templates by category, preview them with sample data, and load them into the editor with one click. Users can also save any prompt as a new template. + +## Technical Details + +- templates table: id, name, description, category, text, variables_schema, created_at +- Seed with 10-15 built-in templates across categories +- Category filter: Extraction, Classification, Generation, Analysis, Transformation +- Template card component: name, description, category badge, variable count +- Preview modal: full template text with sample variables filled in +- "Use Template" loads into editor with variable inputs pre-populated +- "Save as Template" from editor creates a new user template + +## Dependencies + +- ENG-001 (prompt editor to load templates into) + +## Acceptance Criteria + +- [ ] Template gallery displays cards organized by category +- [ ] Category filter buttons filter the template list +- [ ] Search bar filters by template name and description +- [ ] Click template card opens preview modal +- [ ] "Use Template" loads template into editor +- [ ] "Save as Template" from editor creates a new template +- [ ] User-created templates appear in a "My Templates" section + +## Test Strategy + +### Automated Tests + +- Component test: category filter renders correct templates +- Component test: search filters by name and description +- Integration test: save and load template round-trip + +### Manual Verification (Human-in-the-Loop) + +1. Browse templates — verify categories filter correctly +2. Click "JSON Data Extractor" — verify preview shows sample output +3. Click "Use Template" — verify editor loads the template +4. Edit and save as new template — verify it appears in "My Templates" diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/issues.json b/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/issues.json new file mode 100644 index 0000000..9a916cb --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/issues/issues.json @@ -0,0 +1,86 @@ +{ + "project_name": "AI Prompt Playground", + "project_id": "001-ai-prompt-playground", + "total_estimated_hours": 26, + "issues_list": [ + { + "issue_id": "ENG-001", + "title": "Set up prompt editor with variable detection", + "description": "CodeMirror editor with custom syntax highlighting for {{variable}} placeholders and auto-generated input fields", + "status": "approved", + "estimated_hours": 5, + "dependencies": [], + "test_strategy": { + "automated_tests": "Unit test: variable detection regex, input field generation", + "manual_verification": "Type prompt with {{variables}}, verify inputs appear and update preview" + }, + "file_path": "issues/ENG-001.md" + }, + { + "issue_id": "ENG-002", + "title": "Implement multi-provider LLM adapter layer", + "description": "Provider interface with Claude, GPT-4, and Gemini adapters. Parallel execution with SSE streaming", + "status": "approved", + "estimated_hours": 6, + "dependencies": [], + "test_strategy": { + "automated_tests": "Integration test: each adapter returns valid stream. Mock provider for unit tests", + "manual_verification": "Run prompt against all 3 providers, verify streaming responses appear" + }, + "file_path": "issues/ENG-002.md" + }, + { + "issue_id": "ENG-003", + "title": "Build side-by-side response comparison", + "description": "Response panels displaying streaming output with token count, latency, cost, and rating controls", + "status": "in-review", + "estimated_hours": 4, + "dependencies": ["ENG-001", "ENG-002"], + "test_strategy": { + "automated_tests": "Component test: panels render, expand/collapse works, ratings saved", + "manual_verification": "Run multi-provider test, rate responses, expand one panel to full width" + }, + "file_path": "issues/ENG-003.md" + }, + { + "issue_id": "ENG-004", + "title": "Add prompt version history with diffs", + "description": "Auto-save versions on each run. Version list with timestamps and quality scores. Diff view between any two versions", + "status": "pending", + "estimated_hours": 5, + "dependencies": ["ENG-001"], + "test_strategy": { + "automated_tests": "Unit test: diff computation, version ordering. Integration test: save and retrieve versions", + "manual_verification": "Run prompt 3 times, open history, compare v1 vs v3 diff" + }, + "file_path": "issues/ENG-004.md" + }, + { + "issue_id": "ENG-005", + "title": "Implement response quality scoring", + "description": "Automated scoring (format compliance, length, coverage) plus manual thumbs up/down. Quality trend sparkline", + "status": "pending", + "estimated_hours": 3, + "dependencies": ["ENG-003"], + "test_strategy": { + "automated_tests": "Unit test: scoring algorithm with known inputs. Component test: sparkline renders", + "manual_verification": "Run prompt, check auto-score appears, rate manually, verify trend chart" + }, + "file_path": "issues/ENG-005.md" + }, + { + "issue_id": "ENG-006", + "title": "Create prompt template library", + "description": "Template gallery with categories, search, preview, and one-click load. Save-as-template from editor", + "status": "pending", + "estimated_hours": 3, + "dependencies": ["ENG-001"], + "test_strategy": { + "automated_tests": "Component test: category filter, search, template load into editor", + "manual_verification": "Browse templates, preview one, load it, customize and save as new template" + }, + "file_path": "issues/ENG-006.md" + } + ], + "definition_of_done": "Each feature independently testable by a human, all acceptance criteria met" +} diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/project_request.md b/fixtures/demo/outputs/projects/001-ai-prompt-playground/project_request.md new file mode 100644 index 0000000..11cb061 --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/project_request.md @@ -0,0 +1,16 @@ +# Project Request #001 + +## Project Name +AI Prompt Playground + +## Description +Build an interactive prompt engineering workspace where developers can write, test, and iterate on AI prompts across multiple LLM providers. Features include side-by-side response comparison, prompt version history with diffs, response quality scoring, and a template library for common patterns. + +## Dependencies +None + +## Testable Outcome +A developer writes a prompt, tests it against Claude and GPT-4 simultaneously, sees both responses side-by-side, saves the best version, and tracks how quality improves over iterations. + +--- +*Created: 2026-02-20T09:00:00.000Z* diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/project_status.json b/fixtures/demo/outputs/projects/001-ai-prompt-playground/project_status.json new file mode 100644 index 0000000..6fb6251 --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/project_status.json @@ -0,0 +1,86 @@ +{ + "version": "1.0.0", + "projectId": "001-ai-prompt-playground", + "currentAgent": "complete", + "currentPhase": "complete", + + "agents": { + "pm": { + "status": "complete", + "currentPhase": null, + "completedAt": "2026-02-20T10:30:00.000Z", + "phases": { + "questions-generate": { "status": "complete", "startedAt": "2026-02-20T09:00:00.000Z", "completedAt": "2026-02-20T09:04:00.000Z" }, + "questions-answer": { "status": "complete", "startedAt": "2026-02-20T09:04:00.000Z", "completedAt": "2026-02-20T09:30:00.000Z" }, + "prd-generate": { "status": "complete", "startedAt": "2026-02-20T09:30:00.000Z", "completedAt": "2026-02-20T10:00:00.000Z" }, + "prd-review": { "status": "complete", "startedAt": "2026-02-20T10:00:00.000Z", "completedAt": "2026-02-20T10:30:00.000Z" } + } + }, + "ux": { + "status": "complete", + "currentPhase": null, + "completedAt": "2026-02-20T13:00:00.000Z", + "phases": { + "questions-generate": { "status": "complete", "startedAt": "2026-02-20T10:30:00.000Z", "completedAt": "2026-02-20T10:34:00.000Z" }, + "questions-answer": { "status": "complete", "startedAt": "2026-02-20T10:34:00.000Z", "completedAt": "2026-02-20T11:10:00.000Z" }, + "design-brief-generate": { "status": "complete", "startedAt": "2026-02-20T11:10:00.000Z", "completedAt": "2026-02-20T12:00:00.000Z" }, + "design-brief-review": { "status": "complete", "startedAt": "2026-02-20T12:00:00.000Z", "completedAt": "2026-02-20T13:00:00.000Z" } + } + }, + "engineer": { + "status": "complete", + "currentPhase": null, + "completedAt": "2026-02-20T15:30:00.000Z", + "phases": { + "questions-generate": { "status": "complete", "startedAt": "2026-02-20T13:00:00.000Z", "completedAt": "2026-02-20T13:04:00.000Z" }, + "questions-answer": { "status": "complete", "startedAt": "2026-02-20T13:04:00.000Z", "completedAt": "2026-02-20T13:35:00.000Z" }, + "spec-generate": { "status": "complete", "startedAt": "2026-02-20T13:35:00.000Z", "completedAt": "2026-02-20T14:30:00.000Z" }, + "spec-review": { "status": "complete", "startedAt": "2026-02-20T14:30:00.000Z", "completedAt": "2026-02-20T15:30:00.000Z" } + } + } + }, + + "history": [ + { "phase": "pm-questions-generate", "startedAt": "2026-02-20T09:00:00.000Z", "completedAt": "2026-02-20T09:04:00.000Z", "status": "complete" }, + { "phase": "pm-questions-answer", "startedAt": "2026-02-20T09:04:00.000Z", "completedAt": "2026-02-20T09:30:00.000Z", "status": "complete" }, + { "phase": "pm-prd-generate", "startedAt": "2026-02-20T09:30:00.000Z", "completedAt": "2026-02-20T10:00:00.000Z", "status": "complete" }, + { "phase": "pm-prd-review", "startedAt": "2026-02-20T10:00:00.000Z", "completedAt": "2026-02-20T10:30:00.000Z", "status": "complete" }, + { "phase": "ux-questions-generate", "startedAt": "2026-02-20T10:30:00.000Z", "completedAt": "2026-02-20T10:34:00.000Z", "status": "complete" }, + { "phase": "ux-questions-answer", "startedAt": "2026-02-20T10:34:00.000Z", "completedAt": "2026-02-20T11:10:00.000Z", "status": "complete" }, + { "phase": "ux-design-brief-generate", "startedAt": "2026-02-20T11:10:00.000Z", "completedAt": "2026-02-20T12:00:00.000Z", "status": "complete" }, + { "phase": "ux-design-brief-review", "startedAt": "2026-02-20T12:00:00.000Z", "completedAt": "2026-02-20T13:00:00.000Z", "status": "complete" }, + { "phase": "engineer-questions-generate", "startedAt": "2026-02-20T13:00:00.000Z", "completedAt": "2026-02-20T13:04:00.000Z", "status": "complete" }, + { "phase": "engineer-questions-answer", "startedAt": "2026-02-20T13:04:00.000Z", "completedAt": "2026-02-20T13:35:00.000Z", "status": "complete" }, + { "phase": "engineer-spec-generate", "startedAt": "2026-02-20T13:35:00.000Z", "completedAt": "2026-02-20T14:30:00.000Z", "status": "complete" }, + { "phase": "engineer-spec-review", "startedAt": "2026-02-20T14:30:00.000Z", "completedAt": "2026-02-20T15:30:00.000Z", "status": "complete" } + ], + + "settings": { + "question_depth": "standard", + "document_length": "standard" + }, + + "icon": { + "type": "icon", + "value": "zap", + "color": "hsl(270 70% 55%)" + }, + + "costTracking": { + "tier": "standard", + "phases": [ + { "phase": "pm-questions-generate", "phaseName": "PM Questions", "inputTokens": 1300, "outputTokens": 860, "timestamp": "2026-02-20T09:04:00.000Z" }, + { "phase": "pm-prd-generate", "phaseName": "PRD Generation", "inputTokens": 3600, "outputTokens": 5100, "timestamp": "2026-02-20T10:00:00.000Z" }, + { "phase": "ux-questions-generate", "phaseName": "UX Questions", "inputTokens": 2200, "outputTokens": 720, "timestamp": "2026-02-20T10:34:00.000Z" }, + { "phase": "ux-design-brief-generate", "phaseName": "Design Brief", "inputTokens": 4400, "outputTokens": 6800, "timestamp": "2026-02-20T12:00:00.000Z" }, + { "phase": "engineer-questions-generate", "phaseName": "Engineer Questions", "inputTokens": 3200, "outputTokens": 960, "timestamp": "2026-02-20T13:04:00.000Z" }, + { "phase": "engineer-spec-generate", "phaseName": "Technical Spec", "inputTokens": 5900, "outputTokens": 8700, "timestamp": "2026-02-20T14:30:00.000Z" } + ], + "lastUpdated": "2026-02-20T15:30:00.000Z" + }, + + "approvedDocuments": ["prd", "acceptance-criteria", "design", "screens", "tech-spec", "technology-choices"], + + "createdAt": "2026-02-20T09:00:00.000Z", + "lastUpdatedAt": "2026-02-20T15:30:00.000Z" +} diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/questions/engineer_questions.json b/fixtures/demo/outputs/projects/001-ai-prompt-playground/questions/engineer_questions.json new file mode 100644 index 0000000..e18bd74 --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/questions/engineer_questions.json @@ -0,0 +1,25 @@ +{ + "project_request": "Interactive prompt engineering workspace with multi-provider testing and version history.", + "questions": [ + { + "question": "How should we abstract the multi-provider LLM API calls?", + "options": ["Direct SDK calls with if/else per provider", "Adapter pattern with provider interface", "Use Vercel AI SDK which already abstracts providers"], + "answer": "Adapter pattern with provider interface — gives us full control and easy testing" + }, + { + "question": "Should responses be streamed or returned in bulk?", + "options": ["Bulk response only", "Streaming via SSE", "Streaming with option to wait for complete response"], + "answer": "Streaming via SSE for real-time display, with final complete response stored" + }, + { + "question": "How should version history and diffs be stored?", + "options": ["Full prompt text per version in database", "Git-like delta storage", "Full text with computed diffs on read"], + "answer": "Full text with computed diffs on read — simplest to implement, storage is cheap" + }, + { + "question": "How should we handle token counting across different providers?", + "options": ["Estimate with tiktoken for all", "Provider-specific tokenizers", "Provider-specific tokenizers with a fallback estimator"], + "answer": "Provider-specific tokenizers with a fallback estimator" + } + ] +} diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/questions/pm_questions.json b/fixtures/demo/outputs/projects/001-ai-prompt-playground/questions/pm_questions.json new file mode 100644 index 0000000..fd41973 --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/questions/pm_questions.json @@ -0,0 +1,25 @@ +{ + "project_request": "Interactive prompt engineering workspace with multi-provider testing and version history.", + "questions": [ + { + "question": "What LLM providers should the playground support initially?", + "options": ["Claude only", "Claude + GPT-4", "Claude + GPT-4 + Gemini with easy extensibility"], + "answer": "Claude + GPT-4 + Gemini with easy extensibility for new providers" + }, + { + "question": "How should prompt versioning work?", + "options": ["Simple save/overwrite", "Linear version list", "Git-like version history with diffs between versions"], + "answer": "Git-like version history with diffs between versions" + }, + { + "question": "Should there be team collaboration features?", + "options": ["Single-user only", "Shared workspace with real-time editing", "Not in v1, but prompts exportable/importable as JSON"], + "answer": "Not in v1, but prompts should be exportable/importable as JSON" + }, + { + "question": "How should response quality be measured?", + "options": ["Manual rating only", "Automated scoring based on criteria", "Both: automated scoring plus optional human thumbs up/down"], + "answer": "Both: automated scoring based on format compliance, length, and optional human thumbs up/down" + } + ] +} diff --git a/fixtures/demo/outputs/projects/001-ai-prompt-playground/questions/ux_questions.json b/fixtures/demo/outputs/projects/001-ai-prompt-playground/questions/ux_questions.json new file mode 100644 index 0000000..fb65a94 --- /dev/null +++ b/fixtures/demo/outputs/projects/001-ai-prompt-playground/questions/ux_questions.json @@ -0,0 +1,25 @@ +{ + "project_request": "Interactive prompt engineering workspace with multi-provider testing and version history.", + "questions": [ + { + "question": "How should the main workspace be laid out?", + "options": ["Prompt on top, responses below", "Prompt left, responses right (split view)", "Tabbed interface with prompt and responses on separate tabs"], + "answer": "Prompt left, responses right (split view) — developers are used to editor+output layouts" + }, + { + "question": "How should multi-provider responses be displayed?", + "options": ["Tabbed (one at a time)", "Side-by-side columns", "Configurable: tabs or side-by-side"], + "answer": "Side-by-side columns for comparison, with option to expand one to full width" + }, + { + "question": "How should version comparison look?", + "options": ["Simple list with timestamps", "Diff view like GitHub PRs", "Timeline with visual diff highlighting"], + "answer": "Diff view like GitHub PRs with red/green highlighting" + }, + { + "question": "How should the template library be organized?", + "options": ["Flat list with search", "Categories with tags", "Categories with tags and a 'Featured' section"], + "answer": "Categories with tags and a 'Featured' section for most-used templates" + } + ] +} diff --git a/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/acceptance_criteria.json b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/acceptance_criteria.json new file mode 100644 index 0000000..6ac5926 --- /dev/null +++ b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/acceptance_criteria.json @@ -0,0 +1,99 @@ +{ + "project_id": "002-context-aware-autocomplete", + "project_name": "Context-Aware Autocomplete", + "job_stories": [ + { + "job_story_id": "js_001", + "title": "Get Context-Aware Code Completion", + "situation": "I am writing code in a project with established patterns", + "motivation": "get completions that match my codebase style without manual editing", + "outcome": "I can accept suggestions that fit naturally into my code", + "acceptance_criteria": [ + { + "id": "ac_001_01", + "given": "I am typing in a TypeScript file", + "when": "I pause after typing a partial function name", + "then": "Ghost text appears within 100ms showing the top completion matching project naming conventions" + }, + { + "id": "ac_001_02", + "given": "Ghost text is displayed", + "when": "I press Tab", + "then": "The suggestion is accepted and inserted at the cursor position" + }, + { + "id": "ac_001_03", + "given": "I want to see alternative completions", + "when": "I press Alt+]", + "then": "A ranked list of up to 5 alternatives appears with confidence indicators" + }, + { + "id": "ac_001_04", + "given": "The project uses camelCase for functions and PascalCase for classes", + "when": "I start typing a new function or class", + "then": "Suggestions follow the correct casing convention for the symbol type" + } + ] + }, + { + "job_story_id": "js_002", + "title": "Auto-Import from Correct Modules", + "situation": "I am using a symbol that needs to be imported", + "motivation": "have the correct import added automatically without hunting for the right path", + "outcome": "the import statement is added following my project's import conventions", + "acceptance_criteria": [ + { + "id": "ac_002_01", + "given": "I accept a completion for a symbol not yet imported", + "when": "The completion is inserted", + "then": "The corresponding import statement is auto-added at the top of the file" + }, + { + "id": "ac_002_02", + "given": "A symbol is exported from both a local module and node_modules", + "when": "Auto-import is triggered", + "then": "The local module import is preferred over the external package" + }, + { + "id": "ac_002_03", + "given": "The project uses barrel file re-exports (index.ts)", + "when": "Auto-import is triggered", + "then": "The import uses the barrel file path, not the deep file path" + }, + { + "id": "ac_002_04", + "given": "The project uses path aliases (@/components)", + "when": "Auto-import is triggered", + "then": "The import uses the configured path alias, not relative paths" + } + ] + }, + { + "job_story_id": "js_003", + "title": "Learn and Apply Project Conventions", + "situation": "the codebase has been indexed and I have been coding for a while", + "motivation": "have suggestions that improve over time as the tool learns my patterns", + "outcome": "completions feel like they were written by someone who knows the codebase", + "acceptance_criteria": [ + { + "id": "ac_003_01", + "given": "I have accepted and rejected multiple suggestions", + "when": "New suggestions are generated", + "then": "Accepted patterns are ranked higher and rejected patterns are deprioritized" + }, + { + "id": "ac_003_02", + "given": "The project follows a service/repository architecture", + "when": "I create a new file in the services directory", + "then": "Suggestions include the standard service class structure used in other service files" + }, + { + "id": "ac_003_03", + "given": "A file is modified and saved", + "when": "The file watcher detects the change", + "then": "The index is incrementally updated within 2 seconds reflecting the new code" + } + ] + } + ] +} diff --git a/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/design_brief.md b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/design_brief.md new file mode 100644 index 0000000..24757fa --- /dev/null +++ b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/design_brief.md @@ -0,0 +1,73 @@ +# Design Brief: Context-Aware Autocomplete + +## Design Goals + +1. **Invisible when working** - Suggestions appear naturally without disrupting flow +2. **Confidence at a glance** - Color-coded indicators show suggestion quality +3. **Zero configuration** - Works out of the box, learns from the codebase + +## User Flows + +### Flow 1: Accept Inline Completion + +``` +Developer types code + | +Ghost text appears (gray, inline) + | +Tab to accept / Escape to dismiss / Alt+] for alternatives + | +If accepted: + - Code inserted at cursor + - Import added if needed + - Suggestion logged for learning +``` + +### Flow 2: Initial Codebase Indexing + +``` +Open project for first time + | +Status bar shows "Indexing: 0/1,247 files..." + | +Progress bar fills as files are parsed + | +"Indexing complete" notification + | +Status bar shows green dot (index ready) +``` + +### Flow 3: Configure Preferences + +``` +Settings > Autocomplete + | +Toggle: Ghost text, suggestion list, auto-import + | +Keybindings: Accept, dismiss, cycle alternatives + | +Language-specific settings per workspace +``` + +## Key Screens + +1. **Editor with Suggestions** - Ghost text inline, optional dropdown list +2. **Indexing Progress** - Status bar indicator with expandable detail panel +3. **Settings Panel** - Feature toggles, keybindings, language settings + +## Visual Guidelines + +- Ghost text: 50% opacity of the editor text color +- Confidence colors: Green (high), Yellow (medium), Gray (low) +- Status bar: Small dot indicator — green (ready), yellow (indexing), red (error) +- Suggestion list: Follows editor theme, max 5 items visible + +## Accessibility + +- All interactions keyboard-accessible +- Screen reader announces suggestion availability +- Ghost text distinguishable from real code (opacity + optional underline) +- High contrast mode uses distinct borders instead of opacity + +--- +*Document generated as part of SpecWright specification* diff --git a/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/prd.md b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/prd.md new file mode 100644 index 0000000..5e11712 --- /dev/null +++ b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/prd.md @@ -0,0 +1,70 @@ +# Product Requirements Document: Context-Aware Autocomplete + +## Overview + +An AI-powered code completion engine that understands the full codebase — not just the current file. It indexes project structure, learns naming conventions and architecture patterns, and offers completions that feel like they were written by a senior team member who knows the codebase inside out. + +## Problem Statement + +Existing autocomplete tools operate with limited context — typically the current file and its direct imports. This leads to suggestions that are technically valid but don't match the project's conventions, import from wrong modules, or miss established patterns. Developers waste time editing suggestions to fit their codebase style. + +## Goals + +1. **Codebase-aware** - Suggestions reflect the project's actual patterns, not generic completions +2. **Sub-100ms latency** - Ghost text appears instantly, ranked list within 300ms +3. **Privacy-first** - All indexing runs locally, code never leaves the developer's machine +4. **Convention learning** - Automatically detects and applies naming, structure, and import patterns + +## User Stories + +### US-1: Get Context-Aware Code Completion +**As a** developer, **I want to** receive code completions that understand my project's conventions, **so that** I spend less time editing suggestions to match existing code. + +**Acceptance Criteria:** +- Ghost text appears as I type with the top suggestion +- Suggestions use the project's naming conventions (camelCase, snake_case, etc.) +- Completions import from correct modules based on project structure +- Tab accepts, Escape dismisses, Alt+] cycles alternatives + +### US-2: Auto-Import from Correct Modules +**As a** developer, **I want to** get auto-import suggestions that know my project's module structure, **so that** imports are always from the right paths. + +**Acceptance Criteria:** +- When completing a symbol, the correct import statement is auto-added +- Prefers project-local modules over node_modules when both export same name +- Follows the project's import style (relative vs absolute, barrel files vs direct) + +### US-3: Learn and Apply Project Conventions +**As a** developer, **I want to** the tool to learn my project's patterns over time, **so that** suggestions improve as it understands the codebase better. + +**Acceptance Criteria:** +- Detects naming conventions per folder/module +- Learns from accepted/rejected suggestions +- Applies architectural patterns (service layer, repository pattern, etc.) + +## Scope + +### In Scope +- Codebase indexing with AST + semantic embeddings +- Inline ghost text and ranked suggestion list +- TypeScript, Python, Go language support +- Local-only indexing (privacy-first) +- LSP integration for VS Code + +### Out of Scope +- JetBrains IDE support (v2) +- Multi-repo context (v2) +- Natural language to code completion +- Vim/Neovim support (community contribution) + +## Success Metrics + +| Metric | Target | +|--------|--------| +| Suggestion acceptance rate | >35% | +| Correct import rate | >90% | +| Convention match rate | >85% | +| Latency (ghost text) | <100ms p95 | + +--- +*Document generated as part of SpecWright specification* diff --git a/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/screens.json b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/screens.json new file mode 100644 index 0000000..02a31ac --- /dev/null +++ b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/screens.json @@ -0,0 +1,326 @@ +{ + "project_id": "002-context-aware-autocomplete", + "project_name": "Context-Aware Autocomplete", + "screens": [ + { + "id": "editor-with-suggestions", + "name": "Editor with Inline Suggestions", + "route": null, + "description": "Code editor showing ghost text completion and optional ranked suggestion dropdown", + "wireframe": { + "type": "stack", + "direction": "vertical", + "children": [ + { + "type": "nav", + "text": "VS Code — my-project", + "children": [ + { "type": "tabs", "tabs": ["userService.ts", "authMiddleware.ts", "index.ts"], "activeTab": "userService.ts" } + ] + }, + { + "type": "row", + "gap": "none", + "children": [ + { + "type": "stack", + "direction": "vertical", + "padding": "none", + "gap": "none", + "children": [ + { + "type": "card", + "padding": "none", + "children": [ + { + "type": "stack", + "gap": "none", + "children": [ + { + "type": "row", + "padding": "sm", + "children": [ + { "type": "text", "text": "14", "size": "xs", "color": "muted" }, + { "type": "text", "text": "export class UserService {", "size": "xs" } + ] + }, + { + "type": "row", + "padding": "sm", + "children": [ + { "type": "text", "text": "15", "size": "xs", "color": "muted" }, + { "type": "text", "text": " async get", "size": "xs" }, + { "type": "text", "text": "UserByEmail(email: string): Promise {", "size": "xs", "color": "muted" } + ] + }, + { + "type": "row", + "padding": "sm", + "children": [ + { "type": "text", "text": "16", "size": "xs", "color": "muted" }, + { "type": "text", "text": "", "size": "xs" } + ] + } + ] + } + ] + }, + { + "type": "card", + "padding": "sm", + "children": [ + { + "type": "stack", + "gap": "xs", + "children": [ + { + "type": "row", + "gap": "sm", + "children": [ + { "type": "badge", "text": "1", "variant": "default" }, + { "type": "text", "text": "getUserByEmail(email: string)", "size": "xs" }, + { "type": "badge", "text": "high", "variant": "default" } + ] + }, + { + "type": "row", + "gap": "sm", + "children": [ + { "type": "badge", "text": "2", "variant": "default" }, + { "type": "text", "text": "getUserById(id: string)", "size": "xs" }, + { "type": "badge", "text": "med", "variant": "warning" } + ] + }, + { + "type": "row", + "gap": "sm", + "children": [ + { "type": "badge", "text": "3", "variant": "default" }, + { "type": "text", "text": "getUserProfile(userId: string)", "size": "xs" }, + { "type": "badge", "text": "low", "variant": "default" } + ] + } + ] + } + ] + } + ] + } + ] + }, + { + "type": "row", + "padding": "sm", + "justify": "between", + "children": [ + { "type": "text", "text": "TypeScript | UTF-8 | LF", "size": "xs", "color": "muted" }, + { + "type": "row", + "gap": "sm", + "children": [ + { "type": "badge", "text": "Index: ready", "variant": "default" }, + { "type": "text", "text": "1,247 files indexed", "size": "xs", "color": "muted" } + ] + } + ] + } + ] + }, + "notes": "Ghost text should be clearly distinguishable from typed code. Suggestion dropdown appears below cursor. Confidence badges use green/yellow/gray.", + "components_to_reuse": [], + "components_to_create": [ + { "name": "GhostText", "path": "components/autocomplete/GhostText.tsx", "description": "Inline completion overlay rendered in the editor" }, + { "name": "SuggestionDropdown", "path": "components/autocomplete/SuggestionDropdown.tsx", "description": "Ranked list of alternative completions with confidence indicators" } + ] + }, + { + "id": "indexing-status", + "name": "Indexing Status Panel", + "route": null, + "description": "Expandable status panel showing codebase indexing progress and details", + "wireframe": { + "type": "card", + "padding": "md", + "children": [ + { + "type": "stack", + "gap": "md", + "children": [ + { + "type": "row", + "justify": "between", + "align": "center", + "children": [ + { "type": "heading", "text": "Codebase Index", "size": "sm" }, + { "type": "badge", "text": "Indexing...", "variant": "warning" } + ] + }, + { "type": "progress", "label": "847 / 1,247 files (68%)" }, + { + "type": "stack", + "gap": "xs", + "children": [ + { + "type": "row", + "justify": "between", + "children": [ + { "type": "text", "text": "TypeScript files", "size": "xs" }, + { "type": "text", "text": "612 / 823", "size": "xs", "color": "muted" } + ] + }, + { + "type": "row", + "justify": "between", + "children": [ + { "type": "text", "text": "Python files", "size": "xs" }, + { "type": "text", "text": "189 / 312", "size": "xs", "color": "muted" } + ] + }, + { + "type": "row", + "justify": "between", + "children": [ + { "type": "text", "text": "Go files", "size": "xs" }, + { "type": "text", "text": "46 / 112", "size": "xs", "color": "muted" } + ] + } + ] + }, + { "type": "text", "text": "Estimated time remaining: ~45 seconds", "size": "xs", "color": "muted" } + ] + } + ] + }, + "notes": "Panel expands from status bar click. Shows per-language breakdown. Closes automatically when indexing completes.", + "components_to_reuse": [ + { "name": "Card", "path": "components/ui/Card.tsx" }, + { "name": "Progress", "path": "components/ui/Progress.tsx" }, + { "name": "Badge", "path": "components/ui/Badge.tsx" } + ], + "components_to_create": [ + { "name": "IndexingPanel", "path": "components/autocomplete/IndexingPanel.tsx", "description": "Expandable panel with per-language indexing progress" } + ] + }, + { + "id": "settings-panel", + "name": "Autocomplete Settings", + "route": null, + "description": "Settings panel for configuring autocomplete behavior, keybindings, and language preferences", + "wireframe": { + "type": "card", + "padding": "lg", + "children": [ + { + "type": "stack", + "gap": "lg", + "children": [ + { "type": "heading", "text": "Autocomplete Settings", "size": "md" }, + { + "type": "stack", + "gap": "md", + "children": [ + { "type": "label", "text": "Features" }, + { + "type": "row", + "justify": "between", + "children": [ + { "type": "text", "text": "Inline ghost text", "size": "sm" }, + { "type": "toggle", "label": "", "checked": true } + ] + }, + { + "type": "row", + "justify": "between", + "children": [ + { "type": "text", "text": "Suggestion dropdown", "size": "sm" }, + { "type": "toggle", "label": "", "checked": true } + ] + }, + { + "type": "row", + "justify": "between", + "children": [ + { "type": "text", "text": "Auto-import on accept", "size": "sm" }, + { "type": "toggle", "label": "", "checked": true } + ] + }, + { + "type": "row", + "justify": "between", + "children": [ + { "type": "text", "text": "Convention learning", "size": "sm" }, + { "type": "toggle", "label": "", "checked": true } + ] + } + ] + }, + { "type": "divider" }, + { + "type": "stack", + "gap": "md", + "children": [ + { "type": "label", "text": "Keybindings" }, + { + "type": "row", + "justify": "between", + "children": [ + { "type": "text", "text": "Accept suggestion", "size": "sm" }, + { "type": "badge", "text": "Tab" } + ] + }, + { + "type": "row", + "justify": "between", + "children": [ + { "type": "text", "text": "Dismiss suggestion", "size": "sm" }, + { "type": "badge", "text": "Escape" } + ] + }, + { + "type": "row", + "justify": "between", + "children": [ + { "type": "text", "text": "Cycle alternatives", "size": "sm" }, + { "type": "badge", "text": "Alt+]" } + ] + } + ] + } + ] + } + ] + }, + "notes": "Settings persist per workspace. Keybindings should be clickable to rebind.", + "components_to_reuse": [ + { "name": "Card", "path": "components/ui/Card.tsx" }, + { "name": "Toggle", "path": "components/ui/Toggle.tsx" }, + { "name": "Badge", "path": "components/ui/Badge.tsx" } + ], + "components_to_create": [ + { "name": "KeybindingInput", "path": "components/settings/KeybindingInput.tsx", "description": "Click-to-record keybinding input" } + ] + } + ], + "user_flows": [ + { + "id": "flow_accept_completion", + "name": "Accept an Inline Completion", + "steps": [ + { "step": 1, "screen": "editor-with-suggestions", "action": "Developer types 'async get' inside a class" }, + { "step": 2, "screen": "editor-with-suggestions", "action": "Ghost text appears: 'getUserByEmail(email: string)'" }, + { "step": 3, "screen": "editor-with-suggestions", "action": "Developer presses Tab to accept the suggestion" }, + { "step": 4, "screen": "editor-with-suggestions", "action": "Code is inserted and import for User type is auto-added" } + ] + }, + { + "id": "flow_initial_indexing", + "name": "First-Time Project Indexing", + "steps": [ + { "step": 1, "screen": "editor-with-suggestions", "action": "Developer opens a project for the first time" }, + { "step": 2, "screen": "indexing-status", "action": "Status bar shows 'Indexing...' with file count progress" }, + { "step": 3, "screen": "indexing-status", "action": "Panel shows per-language breakdown of indexed files" }, + { "step": 4, "screen": "editor-with-suggestions", "action": "Status bar shows green 'Index: ready' and suggestions begin" } + ] + } + ] +} diff --git a/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/technical_specification.md b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/technical_specification.md new file mode 100644 index 0000000..92ada0d --- /dev/null +++ b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/technical_specification.md @@ -0,0 +1,73 @@ +# Technical Specification: Context-Aware Autocomplete + +## Architecture Overview + +The autocomplete engine uses a hybrid indexing approach: AST parsing for structural understanding and embeddings for semantic similarity. It runs as a Language Server (LSP) process communicating with the editor via stdio. + +``` +┌──────────────┐ ┌──────────────────────┐ ┌──────────────┐ +│ Editor/IDE │────▶│ LSP Server (Node) │────▶│ Local Index │ +│ (VS Code) │◀────│ + Completion Engine │◀────│ (SQLite) │ +└──────────────┘ └──────────┬───────────┘ └──────────────┘ + │ + ┌──────────▼───────────┐ + │ Embedding Model │ + │ (ONNX Runtime) │ + └──────────────────────┘ +``` + +## Indexing Pipeline + +### Phase 1: AST Parsing +- Parse each file with tree-sitter for language-agnostic AST +- Extract symbols: functions, classes, types, variables, imports/exports +- Build symbol table with source locations and dependency graph + +### Phase 2: Semantic Embedding +- Chunk code into semantic units (function bodies, class definitions) +- Generate embeddings using a small local model (all-MiniLM-L6-v2) +- Store in SQLite with vector extension for similarity search + +### Phase 3: Convention Detection +- Analyze naming patterns per directory (camelCase, snake_case) +- Detect architectural patterns (service classes, factory functions) +- Build import preference graph (barrel files, path aliases) + +## Completion Algorithm + +1. Extract cursor context: current line, surrounding code, file imports +2. Query symbol table for structural matches (prefix matching) +3. Query embedding index for semantic matches (similar code contexts) +4. Rank results by: convention match + semantic relevance + recency + user history +5. Return top-5 suggestions with confidence scores + +## API / LSP Methods + +### `textDocument/completion` +Standard LSP completion with extended metadata for confidence scoring. + +### `custom/indexStatus` +Returns current indexing progress and health status. + +### `custom/acceptSuggestion` +Records suggestion acceptance for learning model. + +## Storage + +All data stored locally in `~/.autocomplete/{workspace-hash}/`: +- `symbols.db` — SQLite with AST symbol table +- `embeddings.db` — SQLite with vector embeddings +- `preferences.json` — User keybindings and feature toggles +- `learning.db` — Accept/reject history for personalization + +## Performance Targets + +| Operation | Target | Strategy | +|-----------|--------|----------| +| Ghost text display | <100ms | Pre-computed prefix cache | +| Suggestion list | <300ms | Parallel symbol + embedding query | +| Full reindex | <60s for 5K files | Incremental with parallelism | +| Incremental update | <2s per file | AST diff, re-embed changed chunks | + +--- +*Document generated as part of SpecWright specification* diff --git a/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/technology_choices.json b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/technology_choices.json new file mode 100644 index 0000000..9362289 --- /dev/null +++ b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/documents/technology_choices.json @@ -0,0 +1,150 @@ +{ + "project_name": "Context-Aware Autocomplete", + "technology_decisions": [ + { + "category": "Embedding Model", + "description": "Local model for generating code embeddings for semantic search", + "decision_needed": true, + "options": [ + { + "name": "all-MiniLM-L6-v2 (ONNX)", + "description": "Small, fast sentence transformer running locally via ONNX Runtime", + "version": "1.0", + "documentation_url": "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2", + "github_url": "https://github.com/microsoft/onnxruntime", + "pros": ["22MB model size", "Fast inference (<10ms per chunk)", "Good semantic quality for code", "Runs on CPU efficiently"], + "cons": ["384-dimensional embeddings may miss nuance", "Not code-specific"], + "trade_offs": ["Speed vs embedding quality", "Generic vs code-specialized"], + "maturity": "Production-ready", + "community_size": "Very large", + "last_updated": "2024-12", + "implementation_complexity": "Low", + "estimated_time": "3 hours", + "recommended": true, + "recommendation_reason": "Best balance of size, speed, and quality for local inference" + }, + { + "name": "CodeBERT (ONNX)", + "description": "Microsoft's code-specialized BERT model", + "version": "1.0", + "documentation_url": "https://huggingface.co/microsoft/codebert-base", + "github_url": "https://github.com/microsoft/CodeBERT", + "pros": ["Code-specific training", "Better at understanding code semantics"], + "cons": ["440MB model size", "Slower inference (~50ms per chunk)", "Higher memory usage"], + "trade_offs": ["Better quality but significantly larger and slower"], + "maturity": "Stable", + "community_size": "Medium", + "last_updated": "2024-06", + "implementation_complexity": "Medium", + "estimated_time": "5 hours", + "recommended": false, + "recommendation_reason": "" + } + ], + "user_choice": "all-MiniLM-L6-v2 (ONNX)", + "user_choice_description": "Small, fast sentence transformer running locally via ONNX Runtime", + "user_choice_version": "1.0", + "user_reason": "Speed is critical for autocomplete — 10ms inference means we can re-embed on every keystroke", + "final_decision": "all-minilm-l6-v2" + }, + { + "category": "Local Storage", + "description": "Database for storing AST symbols, embeddings, and learning data locally", + "decision_needed": true, + "options": [ + { + "name": "SQLite with sqlite-vss", + "description": "SQLite with vector similarity search extension for unified storage", + "version": "3.45", + "documentation_url": "https://github.com/asg017/sqlite-vss", + "github_url": "https://github.com/asg017/sqlite-vss", + "pros": ["Single file database", "Vector search built-in", "Zero configuration", "Cross-platform"], + "cons": ["Write contention under heavy indexing", "Vector search not as fast as dedicated stores"], + "trade_offs": ["Simplicity vs performance at scale"], + "maturity": "Stable", + "community_size": "Large", + "last_updated": "2024-11", + "implementation_complexity": "Low", + "estimated_time": "4 hours", + "recommended": true, + "recommendation_reason": "Single-file simplicity with built-in vector search is perfect for local tool" + }, + { + "name": "LanceDB", + "description": "Embedded vector database designed for AI applications", + "version": "0.4", + "documentation_url": "https://lancedb.github.io/lancedb/", + "github_url": "https://github.com/lancedb/lancedb", + "pros": ["Purpose-built for vector search", "Faster similarity queries", "Good Node.js bindings"], + "cons": ["Additional dependency", "Less mature", "Separate from symbol table storage"], + "trade_offs": ["Better vector performance but adds complexity"], + "maturity": "Beta", + "community_size": "Medium", + "last_updated": "2024-10", + "implementation_complexity": "Medium", + "estimated_time": "6 hours", + "recommended": false, + "recommendation_reason": "" + } + ], + "user_choice": "SQLite with sqlite-vss", + "user_choice_description": "SQLite with vector similarity search extension for unified storage", + "user_choice_version": "3.45", + "user_reason": "Keep it simple — one database for everything, zero setup for users", + "final_decision": "sqlite-vss" + }, + { + "category": "Parser / AST", + "description": "Code parsing library for extracting symbols and structure across languages", + "decision_needed": true, + "options": [ + { + "name": "tree-sitter", + "description": "Incremental parsing library used by major editors for syntax analysis", + "version": "0.22", + "documentation_url": "https://tree-sitter.github.io/tree-sitter/", + "github_url": "https://github.com/tree-sitter/tree-sitter", + "pros": ["Supports 100+ languages", "Incremental parsing (fast re-parse on edit)", "Battle-tested in VS Code, Neovim", "Excellent error recovery"], + "cons": ["Native dependency (WASM or node-gyp)", "Grammar files can be large"], + "trade_offs": ["More languages and features but heavier dependency"], + "maturity": "Production-ready", + "community_size": "Very large", + "last_updated": "2024-12", + "implementation_complexity": "Medium", + "estimated_time": "6 hours", + "recommended": true, + "recommendation_reason": "Industry standard for multi-language parsing with incremental update support" + }, + { + "name": "TypeScript Compiler API", + "description": "TypeScript's built-in compiler API for parsing TS/JS files", + "version": "5.4", + "documentation_url": "https://www.typescriptlang.org/docs/handbook/compiler-api.html", + "github_url": "https://github.com/microsoft/TypeScript", + "pros": ["Perfect TypeScript understanding", "Type-aware analysis", "No extra dependency for TS projects"], + "cons": ["TypeScript/JavaScript only", "Slower for large files", "No incremental parsing"], + "trade_offs": ["Better TS analysis but limited to one language family"], + "maturity": "Production-ready", + "community_size": "Very large", + "last_updated": "2024-12", + "implementation_complexity": "Low", + "estimated_time": "4 hours", + "recommended": false, + "recommendation_reason": "" + } + ], + "user_choice": "tree-sitter", + "user_choice_description": "Incremental parsing library used by major editors for syntax analysis", + "user_choice_version": "0.22", + "user_reason": "Need multi-language support from day one, and incremental parsing is essential for speed", + "final_decision": "tree-sitter" + } + ], + "summary": { + "total_decisions": 3, + "decisions_made": 3, + "estimated_setup_time": "13 hours", + "estimated_learning_curve": "Medium - tree-sitter and sqlite-vss require some ramp-up", + "overall_complexity": "Medium-High" + } +} diff --git a/fixtures/demo/outputs/projects/002-context-aware-autocomplete/project_request.md b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/project_request.md new file mode 100644 index 0000000..1b27e87 --- /dev/null +++ b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/project_request.md @@ -0,0 +1,16 @@ +# Project Request #002 + +## Project Name +Context-Aware Autocomplete + +## Description +Build an AI autocomplete engine that goes beyond single-file context. It indexes the entire codebase, learns project conventions (naming, patterns, architecture), and provides completions that fit naturally into the existing code. Supports TypeScript, Python, and Go with real-time suggestions as you type. + +## Dependencies +None + +## Testable Outcome +A developer opens a file, starts typing a function, and the autocomplete suggests code that matches the project's existing patterns, imports from the right modules, and follows established naming conventions. + +--- +*Created: 2026-02-21T14:00:00.000Z* diff --git a/fixtures/demo/outputs/projects/002-context-aware-autocomplete/project_status.json b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/project_status.json new file mode 100644 index 0000000..6d80af8 --- /dev/null +++ b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/project_status.json @@ -0,0 +1,83 @@ +{ + "version": "1.0.0", + "projectId": "002-context-aware-autocomplete", + "currentAgent": "pm", + "currentPhase": "pm-prd-review", + + "agents": { + "pm": { + "status": "user-reviewing", + "currentPhase": "prd-review", + "phases": { + "questions-generate": { "status": "complete", "startedAt": "2026-02-21T14:00:00.000Z", "completedAt": "2026-02-21T14:04:00.000Z" }, + "questions-answer": { "status": "complete", "startedAt": "2026-02-21T14:04:00.000Z", "completedAt": "2026-02-21T14:40:00.000Z" }, + "prd-generate": { "status": "complete", "startedAt": "2026-02-21T14:40:00.000Z", "completedAt": "2026-02-21T15:20:00.000Z" }, + "prd-review": { "status": "user-reviewing", "startedAt": "2026-02-21T15:20:00.000Z" } + } + }, + "ux": { + "status": "user-reviewing", + "currentPhase": "design-brief-review", + "phases": { + "questions-generate": { "status": "complete", "startedAt": "2026-02-21T15:20:00.000Z", "completedAt": "2026-02-21T15:24:00.000Z" }, + "questions-answer": { "status": "complete", "startedAt": "2026-02-21T15:24:00.000Z", "completedAt": "2026-02-21T16:00:00.000Z" }, + "design-brief-generate": { "status": "complete", "startedAt": "2026-02-21T16:00:00.000Z", "completedAt": "2026-02-21T16:45:00.000Z" }, + "design-brief-review": { "status": "user-reviewing", "startedAt": "2026-02-21T16:45:00.000Z" } + } + }, + "engineer": { + "status": "user-reviewing", + "currentPhase": "spec-review", + "phases": { + "questions-generate": { "status": "complete", "startedAt": "2026-02-21T16:45:00.000Z", "completedAt": "2026-02-21T16:49:00.000Z" }, + "questions-answer": { "status": "complete", "startedAt": "2026-02-21T16:49:00.000Z", "completedAt": "2026-02-21T17:20:00.000Z" }, + "spec-generate": { "status": "complete", "startedAt": "2026-02-21T17:20:00.000Z", "completedAt": "2026-02-21T18:10:00.000Z" }, + "spec-review": { "status": "user-reviewing", "startedAt": "2026-02-21T18:10:00.000Z" } + } + } + }, + + "history": [ + { "phase": "pm-questions-generate", "startedAt": "2026-02-21T14:00:00.000Z", "completedAt": "2026-02-21T14:04:00.000Z", "status": "complete" }, + { "phase": "pm-questions-answer", "startedAt": "2026-02-21T14:04:00.000Z", "completedAt": "2026-02-21T14:40:00.000Z", "status": "complete" }, + { "phase": "pm-prd-generate", "startedAt": "2026-02-21T14:40:00.000Z", "completedAt": "2026-02-21T15:20:00.000Z", "status": "complete" }, + { "phase": "pm-prd-review", "startedAt": "2026-02-21T15:20:00.000Z", "status": "user-reviewing" }, + { "phase": "ux-questions-generate", "startedAt": "2026-02-21T15:20:00.000Z", "completedAt": "2026-02-21T15:24:00.000Z", "status": "complete" }, + { "phase": "ux-questions-answer", "startedAt": "2026-02-21T15:24:00.000Z", "completedAt": "2026-02-21T16:00:00.000Z", "status": "complete" }, + { "phase": "ux-design-brief-generate", "startedAt": "2026-02-21T16:00:00.000Z", "completedAt": "2026-02-21T16:45:00.000Z", "status": "complete" }, + { "phase": "ux-design-brief-review", "startedAt": "2026-02-21T16:45:00.000Z", "status": "user-reviewing" }, + { "phase": "engineer-questions-generate", "startedAt": "2026-02-21T16:45:00.000Z", "completedAt": "2026-02-21T16:49:00.000Z", "status": "complete" }, + { "phase": "engineer-questions-answer", "startedAt": "2026-02-21T16:49:00.000Z", "completedAt": "2026-02-21T17:20:00.000Z", "status": "complete" }, + { "phase": "engineer-spec-generate", "startedAt": "2026-02-21T17:20:00.000Z", "completedAt": "2026-02-21T18:10:00.000Z", "status": "complete" }, + { "phase": "engineer-spec-review", "startedAt": "2026-02-21T18:10:00.000Z", "status": "user-reviewing" } + ], + + "settings": { + "question_depth": "standard", + "document_length": "comprehensive" + }, + + "icon": { + "type": "icon", + "value": "sparkles", + "color": "hsl(200 80% 50%)" + }, + + "costTracking": { + "tier": "standard", + "phases": [ + { "phase": "pm-questions-generate", "phaseName": "PM Questions", "inputTokens": 1620, "outputTokens": 950, "timestamp": "2026-02-21T14:04:00.000Z" }, + { "phase": "pm-prd-generate", "phaseName": "PRD Generation", "inputTokens": 4100, "outputTokens": 6200, "timestamp": "2026-02-21T15:20:00.000Z" }, + { "phase": "ux-questions-generate", "phaseName": "UX Questions", "inputTokens": 2400, "outputTokens": 780, "timestamp": "2026-02-21T15:24:00.000Z" }, + { "phase": "ux-design-brief-generate", "phaseName": "Design Brief", "inputTokens": 4800, "outputTokens": 7100, "timestamp": "2026-02-21T16:45:00.000Z" }, + { "phase": "engineer-questions-generate", "phaseName": "Engineer Questions", "inputTokens": 3400, "outputTokens": 1020, "timestamp": "2026-02-21T16:49:00.000Z" }, + { "phase": "engineer-spec-generate", "phaseName": "Technical Spec", "inputTokens": 6200, "outputTokens": 9300, "timestamp": "2026-02-21T18:10:00.000Z" } + ], + "lastUpdated": "2026-02-21T18:10:00.000Z" + }, + + "approvedDocuments": [], + + "createdAt": "2026-02-21T14:00:00.000Z", + "lastUpdatedAt": "2026-02-21T18:10:00.000Z" +} diff --git a/fixtures/demo/outputs/projects/002-context-aware-autocomplete/questions/engineer_questions.json b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/questions/engineer_questions.json new file mode 100644 index 0000000..09c3b25 --- /dev/null +++ b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/questions/engineer_questions.json @@ -0,0 +1,25 @@ +{ + "project_request": "AI autocomplete engine that indexes the codebase and provides context-aware completions.", + "questions": [ + { + "question": "What indexing strategy should we use for the codebase?", + "options": ["AST parsing + symbol table", "Embedding-based semantic search", "Hybrid: AST for structure, embeddings for semantics"], + "answer": "Hybrid: AST for structure, embeddings for semantics" + }, + { + "question": "How should we handle incremental updates when files change?", + "options": ["Re-index changed files only", "Re-index changed files + dependents", "Background re-index with eventual consistency"], + "answer": "Re-index changed files + dependents" + }, + { + "question": "Should we integrate via Language Server Protocol (LSP) or a custom protocol?", + "options": ["Standard LSP for broad editor support", "Custom protocol for richer features", "LSP base with custom extensions"], + "answer": "LSP base with custom extensions" + }, + { + "question": "How should we handle token counting and model context limits?", + "options": ["Fixed context window with truncation", "Smart context selection based on relevance", "Tiered: small model for fast completions, large model for complex ones"], + "answer": "Smart context selection based on relevance" + } + ] +} diff --git a/fixtures/demo/outputs/projects/002-context-aware-autocomplete/questions/pm_questions.json b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/questions/pm_questions.json new file mode 100644 index 0000000..9ae2667 --- /dev/null +++ b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/questions/pm_questions.json @@ -0,0 +1,25 @@ +{ + "project_request": "AI autocomplete engine that indexes the codebase and provides context-aware completions.", + "questions": [ + { + "question": "How deep should the context analysis go — current file, current directory, or entire codebase?", + "options": ["Current file + imports only", "Current directory tree", "Entire codebase with smart relevance ranking"], + "answer": "Entire codebase with smart relevance ranking" + }, + { + "question": "What is the acceptable latency for suggestion display?", + "options": ["Under 50ms (instant feel)", "Under 200ms (slight pause acceptable)", "Under 500ms (background analysis OK)"], + "answer": "Under 100ms for inline ghost text, up to 300ms for ranked suggestion list" + }, + { + "question": "Should the autocomplete learn from the developer's accept/reject patterns?", + "options": ["No, use codebase conventions only", "Yes, personalize per developer", "Yes, with team-level and individual preferences"], + "answer": "Yes, with team-level and individual preferences" + }, + { + "question": "How should we handle private/sensitive code in the indexing?", + "options": ["Index everything locally, never send to cloud", "Cloud-based with encryption", "Configurable: local-only or cloud-assisted"], + "answer": "Index everything locally, never send to cloud" + } + ] +} diff --git a/fixtures/demo/outputs/projects/002-context-aware-autocomplete/questions/ux_questions.json b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/questions/ux_questions.json new file mode 100644 index 0000000..4fa5168 --- /dev/null +++ b/fixtures/demo/outputs/projects/002-context-aware-autocomplete/questions/ux_questions.json @@ -0,0 +1,25 @@ +{ + "project_request": "AI autocomplete engine that indexes the codebase and provides context-aware completions.", + "questions": [ + { + "question": "How should suggestions be displayed to the developer?", + "options": ["Ghost text inline (like Copilot)", "Dropdown list with previews", "Both: ghost text for top suggestion, list for alternatives"], + "answer": "Both: ghost text for top suggestion, list for alternatives" + }, + { + "question": "Should there be a confidence indicator on suggestions?", + "options": ["No, just show the suggestion", "Yes, color-coded (green/yellow/gray)", "Yes, with a percentage score"], + "answer": "Yes, color-coded (green/yellow/gray)" + }, + { + "question": "How should the user dismiss or cycle through suggestions?", + "options": ["Escape to dismiss, Tab to accept", "Arrow keys to cycle, Enter to accept", "Configurable keybindings"], + "answer": "Configurable keybindings with sensible defaults (Tab accept, Escape dismiss, Alt+] cycle)" + }, + { + "question": "Should there be a visual indicator showing indexing status?", + "options": ["No, keep it invisible", "Status bar indicator", "Status bar + progress during initial indexing"], + "answer": "Status bar + progress during initial indexing" + } + ] +} diff --git a/fixtures/demo/outputs/projects/003-ai-test-generator/documents/acceptance_criteria.json b/fixtures/demo/outputs/projects/003-ai-test-generator/documents/acceptance_criteria.json new file mode 100644 index 0000000..b68bd32 --- /dev/null +++ b/fixtures/demo/outputs/projects/003-ai-test-generator/documents/acceptance_criteria.json @@ -0,0 +1,99 @@ +{ + "project_id": "003-ai-test-generator", + "project_name": "AI Test Generator", + "job_stories": [ + { + "job_story_id": "js_001", + "title": "Generate Tests from Source File", + "situation": "I have a source file with functions that need test coverage", + "motivation": "quickly add meaningful tests without writing them manually", + "outcome": "I have a runnable test suite covering happy paths, edge cases, and error handling", + "acceptance_criteria": [ + { + "id": "ac_001_01", + "given": "I am on the Test Generator page", + "when": "I select a TypeScript or Python source file", + "then": "The tool displays the file contents with detected functions highlighted" + }, + { + "id": "ac_001_02", + "given": "A source file is loaded", + "when": "I click 'Generate Tests'", + "then": "A progress indicator shows analysis steps: parsing, analyzing branches, generating tests" + }, + { + "id": "ac_001_03", + "given": "Test generation completes", + "when": "I view the results", + "then": "Each generated test has a descriptive name, clear arrange/act/assert structure, and a comment explaining what it validates" + }, + { + "id": "ac_001_04", + "given": "Tests are generated", + "when": "I click 'Copy to Clipboard' or 'Save as File'", + "then": "The test file is saved with the correct naming convention (e.g., *.test.ts, *_test.py)" + } + ] + }, + { + "job_story_id": "js_002", + "title": "Discover and Test Edge Cases", + "situation": "the AI has analyzed my source code", + "motivation": "find boundary conditions and error paths I might have missed", + "outcome": "I have tests for scenarios I would not have thought of manually", + "acceptance_criteria": [ + { + "id": "ac_002_01", + "given": "A function accepts numeric input", + "when": "The AI generates edge case tests", + "then": "Tests include zero, negative numbers, MAX_SAFE_INTEGER, NaN, and Infinity" + }, + { + "id": "ac_002_02", + "given": "A function accepts string input", + "when": "The AI generates edge case tests", + "then": "Tests include empty string, very long string, unicode characters, and special characters" + }, + { + "id": "ac_002_03", + "given": "A function has error handling (try/catch, if/throw)", + "when": "The AI generates tests", + "then": "At least one test verifies each error path is triggered with appropriate assertions" + }, + { + "id": "ac_002_04", + "given": "A function calls external dependencies", + "when": "The AI generates tests", + "then": "Mock setup is included with both success and failure scenarios for each dependency" + } + ] + }, + { + "job_story_id": "js_003", + "title": "Configure Framework and Preferences", + "situation": "I want to generate tests for a specific project setup", + "motivation": "ensure generated tests match my team's conventions and tooling", + "outcome": "I get tests that fit seamlessly into my existing test suite", + "acceptance_criteria": [ + { + "id": "ac_003_01", + "given": "I am on the settings page", + "when": "I select a test framework from the dropdown", + "then": "The options include Jest, Vitest, and Pytest with framework-specific import styles" + }, + { + "id": "ac_003_02", + "given": "I have configured Vitest as my framework", + "when": "Tests are generated", + "then": "The output uses Vitest imports (vi.fn, vi.mock) and describe/it/expect syntax" + }, + { + "id": "ac_003_03", + "given": "I toggle off 'Integration tests' in settings", + "when": "Tests are generated", + "then": "Only unit tests are produced with all external calls mocked" + } + ] + } + ] +} diff --git a/fixtures/demo/outputs/projects/003-ai-test-generator/documents/prd.md b/fixtures/demo/outputs/projects/003-ai-test-generator/documents/prd.md new file mode 100644 index 0000000..e567d14 --- /dev/null +++ b/fixtures/demo/outputs/projects/003-ai-test-generator/documents/prd.md @@ -0,0 +1,72 @@ +# Product Requirements Document: AI Test Generator + +## Overview + +An AI-powered test generation tool that analyzes source code and produces comprehensive, meaningful test suites. Unlike coverage-padding tools, it understands code intent, identifies real edge cases, and generates tests that catch actual bugs. Supports Jest, Vitest, and Pytest out of the box. + +## Problem Statement + +Writing tests is tedious and developers often skip edge cases. Existing test generators produce superficial tests that inflate coverage numbers without catching real issues. Teams need a tool that understands what the code is supposed to do and generates tests that would actually catch regressions. + +## Goals + +1. **Meaningful tests** - Generate tests that validate behavior, not just exercise code paths +2. **Edge case discovery** - Identify boundary conditions, null handling, and error scenarios automatically +3. **Framework-native** - Output tests that follow each framework's idioms and best practices +4. **Gap analysis** - Detect untested code paths in existing test suites and fill the gaps + +## User Stories + +### US-1: Generate Tests from Source File +**As a** developer, **I want to** point the tool at a source file and get a complete test suite, **so that** I can quickly add test coverage for new or untested code. + +**Acceptance Criteria:** +- Select a source file via file picker or CLI path +- AI analyzes function signatures, types, and logic branches +- Generated tests include happy path, edge cases, and error handling +- Output file matches the project's test naming convention + +### US-2: Fill Gaps in Existing Tests +**As a** developer, **I want to** analyze my existing tests and generate missing coverage, **so that** I can improve test quality without duplicating work. + +**Acceptance Criteria:** +- Tool reads both source and existing test files +- Identifies untested functions and uncovered branches +- Generates only the missing tests, not duplicates +- Preserves existing test structure and patterns + +### US-3: Configure Test Generation Preferences +**As a** developer, **I want to** control what types of tests are generated, **so that** the output matches my team's testing philosophy. + +**Acceptance Criteria:** +- Choose test framework (Jest, Vitest, Pytest) +- Toggle test categories: unit, integration, edge cases +- Configure mock strategy for external dependencies +- Set assertion style preference (expect, assert, should) + +## Scope + +### In Scope +- Source code analysis for TypeScript, JavaScript, Python +- Test generation for Jest, Vitest, Pytest +- Gap analysis against existing test files +- Auto-mocking for common patterns (HTTP, DB, filesystem) +- Test file output following project conventions + +### Out of Scope +- E2E test generation (v2) +- Visual regression tests +- Performance/load test generation +- CI/CD integration + +## Success Metrics + +| Metric | Target | +|--------|--------| +| Tests that compile and run | >95% | +| Meaningful assertion rate | >80% (vs trivial assertions) | +| Edge cases discovered per file | avg 3+ | +| Developer edit rate | <20% of generated tests need changes | + +--- +*Document generated as part of SpecWright specification* diff --git a/fixtures/demo/outputs/projects/003-ai-test-generator/project_request.md b/fixtures/demo/outputs/projects/003-ai-test-generator/project_request.md new file mode 100644 index 0000000..4114896 --- /dev/null +++ b/fixtures/demo/outputs/projects/003-ai-test-generator/project_request.md @@ -0,0 +1,16 @@ +# Project Request #003 + +## Project Name +AI Test Generator + +## Description +An AI-powered tool that reads source code and generates comprehensive test suites. It analyzes function signatures, identifies edge cases, understands integration points, and produces tests that validate real behavior — not just line coverage. Supports Jest, Vitest, and Pytest with framework-specific best practices. + +## Dependencies +None + +## Testable Outcome +A developer points the tool at a source file, it analyzes the code structure, and generates a test file with meaningful unit tests covering happy paths, edge cases, and error handling — ready to run with zero modifications. + +--- +*Created: 2026-02-22T10:00:00.000Z* diff --git a/fixtures/demo/outputs/projects/003-ai-test-generator/project_status.json b/fixtures/demo/outputs/projects/003-ai-test-generator/project_status.json new file mode 100644 index 0000000..a74dff1 --- /dev/null +++ b/fixtures/demo/outputs/projects/003-ai-test-generator/project_status.json @@ -0,0 +1,73 @@ +{ + "version": "1.0.0", + "projectId": "003-ai-test-generator", + "currentAgent": "ux", + "currentPhase": "ux-questions-generate", + + "agents": { + "pm": { + "status": "complete", + "currentPhase": null, + "completedAt": "2026-02-22T12:00:00.000Z", + "phases": { + "questions-generate": { "status": "complete", "startedAt": "2026-02-22T10:00:00.000Z", "completedAt": "2026-02-22T10:04:00.000Z" }, + "questions-answer": { "status": "complete", "startedAt": "2026-02-22T10:04:00.000Z", "completedAt": "2026-02-22T10:35:00.000Z" }, + "prd-generate": { "status": "complete", "startedAt": "2026-02-22T10:35:00.000Z", "completedAt": "2026-02-22T11:15:00.000Z" }, + "prd-review": { "status": "complete", "startedAt": "2026-02-22T11:15:00.000Z", "completedAt": "2026-02-22T12:00:00.000Z" } + } + }, + "ux": { + "status": "ai-working", + "currentPhase": "questions-generate", + "phases": { + "questions-generate": { "status": "ai-working", "startedAt": "2026-02-22T12:00:00.000Z" }, + "questions-answer": { "status": "not-started" }, + "design-brief-generate": { "status": "not-started" }, + "design-brief-review": { "status": "not-started" } + } + }, + "engineer": { + "status": "not-started", + "currentPhase": null, + "phases": { + "questions-generate": { "status": "not-started" }, + "questions-answer": { "status": "not-started" }, + "spec-generate": { "status": "not-started" }, + "spec-review": { "status": "not-started" } + } + } + }, + + "history": [ + { "phase": "pm-questions-generate", "startedAt": "2026-02-22T10:00:00.000Z", "completedAt": "2026-02-22T10:04:00.000Z", "status": "complete" }, + { "phase": "pm-questions-answer", "startedAt": "2026-02-22T10:04:00.000Z", "completedAt": "2026-02-22T10:35:00.000Z", "status": "complete" }, + { "phase": "pm-prd-generate", "startedAt": "2026-02-22T10:35:00.000Z", "completedAt": "2026-02-22T11:15:00.000Z", "status": "complete" }, + { "phase": "pm-prd-review", "startedAt": "2026-02-22T11:15:00.000Z", "completedAt": "2026-02-22T12:00:00.000Z", "status": "complete" }, + { "phase": "ux-questions-generate", "startedAt": "2026-02-22T12:00:00.000Z", "status": "ai-working" } + ], + + "settings": { + "question_depth": "thorough", + "document_length": "standard" + }, + + "icon": { + "type": "icon", + "value": "flask-conical", + "color": "hsl(145 65% 42%)" + }, + + "costTracking": { + "tier": "standard", + "phases": [ + { "phase": "pm-questions-generate", "phaseName": "PM Questions", "inputTokens": 1350, "outputTokens": 880, "timestamp": "2026-02-22T10:04:00.000Z" }, + { "phase": "pm-prd-generate", "phaseName": "PRD Generation", "inputTokens": 3800, "outputTokens": 5200, "timestamp": "2026-02-22T11:15:00.000Z" } + ], + "lastUpdated": "2026-02-22T12:00:00.000Z" + }, + + "approvedDocuments": ["prd", "acceptance-criteria"], + + "createdAt": "2026-02-22T10:00:00.000Z", + "lastUpdatedAt": "2026-02-22T12:00:00.000Z" +} diff --git a/fixtures/demo/outputs/projects/003-ai-test-generator/questions/engineer_questions.json b/fixtures/demo/outputs/projects/003-ai-test-generator/questions/engineer_questions.json new file mode 100644 index 0000000..3b3df62 --- /dev/null +++ b/fixtures/demo/outputs/projects/003-ai-test-generator/questions/engineer_questions.json @@ -0,0 +1,4 @@ +{ + "project_request": "AI test generator that analyzes code and generates comprehensive test suites.", + "questions": [] +} diff --git a/fixtures/demo/outputs/projects/003-ai-test-generator/questions/pm_questions.json b/fixtures/demo/outputs/projects/003-ai-test-generator/questions/pm_questions.json new file mode 100644 index 0000000..8c3585b --- /dev/null +++ b/fixtures/demo/outputs/projects/003-ai-test-generator/questions/pm_questions.json @@ -0,0 +1,25 @@ +{ + "project_request": "AI test generator that analyzes code and generates comprehensive test suites.", + "questions": [ + { + "question": "Which test frameworks should the generator support initially?", + "options": ["Jest only", "Jest + Vitest", "Jest + Vitest + Pytest"], + "answer": "Jest + Vitest + Pytest" + }, + { + "question": "What types of tests should the AI generate?", + "options": ["Unit tests only", "Unit + integration tests", "Unit + integration + edge case tests"], + "answer": "Unit + integration + edge case tests" + }, + { + "question": "How should the tool handle existing tests in a file?", + "options": ["Ignore existing tests, generate fresh", "Analyze gaps and generate missing tests only", "Both modes available"], + "answer": "Analyze gaps and generate missing tests only" + }, + { + "question": "Should the generated tests include mocking setup for external dependencies?", + "options": ["No, just test the function directly", "Yes, auto-mock common patterns (fetch, DB, fs)", "Yes, with configurable mock strategy per dependency"], + "answer": "Yes, auto-mock common patterns (fetch, DB, fs)" + } + ] +} diff --git a/fixtures/demo/outputs/projects/003-ai-test-generator/questions/ux_questions.json b/fixtures/demo/outputs/projects/003-ai-test-generator/questions/ux_questions.json new file mode 100644 index 0000000..3b3df62 --- /dev/null +++ b/fixtures/demo/outputs/projects/003-ai-test-generator/questions/ux_questions.json @@ -0,0 +1,4 @@ +{ + "project_request": "AI test generator that analyzes code and generates comprehensive test suites.", + "questions": [] +} diff --git a/fixtures/demo/outputs/projects/004-smart-log-analyzer/project_request.md b/fixtures/demo/outputs/projects/004-smart-log-analyzer/project_request.md new file mode 100644 index 0000000..3de3533 --- /dev/null +++ b/fixtures/demo/outputs/projects/004-smart-log-analyzer/project_request.md @@ -0,0 +1,16 @@ +# Project Request #004 + +## Project Name +Smart Log Analyzer + +## Description +An AI-powered log analysis tool that ingests application logs from any source, automatically detects anomalies and error patterns, clusters related events into incidents, identifies probable root causes, and suggests actionable fixes. Works with structured JSON logs, plain text logs, and common formats like Apache/Nginx access logs. + +## Dependencies +None + +## Testable Outcome +A developer uploads or streams application logs, the AI identifies three distinct error patterns, groups related log entries into clusters, and suggests a root cause with a fix for the most critical issue. + +--- +*Created: 2026-02-23T16:00:00.000Z* diff --git a/fixtures/demo/outputs/projects/004-smart-log-analyzer/project_status.json b/fixtures/demo/outputs/projects/004-smart-log-analyzer/project_status.json new file mode 100644 index 0000000..ccb9228 --- /dev/null +++ b/fixtures/demo/outputs/projects/004-smart-log-analyzer/project_status.json @@ -0,0 +1,68 @@ +{ + "version": "1.0.0", + "projectId": "004-smart-log-analyzer", + "currentAgent": "pm", + "currentPhase": "pm-questions-answer", + + "agents": { + "pm": { + "status": "awaiting-user", + "currentPhase": "questions-answer", + "phases": { + "questions-generate": { "status": "complete", "startedAt": "2026-02-23T16:00:00.000Z", "completedAt": "2026-02-23T16:03:00.000Z" }, + "questions-answer": { "status": "awaiting-user", "startedAt": "2026-02-23T16:03:00.000Z" }, + "prd-generate": { "status": "not-started" }, + "prd-review": { "status": "not-started" } + } + }, + "ux": { + "status": "not-started", + "currentPhase": null, + "phases": { + "questions-generate": { "status": "not-started" }, + "questions-answer": { "status": "not-started" }, + "design-brief-generate": { "status": "not-started" }, + "design-brief-review": { "status": "not-started" } + } + }, + "engineer": { + "status": "not-started", + "currentPhase": null, + "phases": { + "questions-generate": { "status": "not-started" }, + "questions-answer": { "status": "not-started" }, + "spec-generate": { "status": "not-started" }, + "spec-review": { "status": "not-started" } + } + } + }, + + "history": [ + { "phase": "pm-questions-generate", "startedAt": "2026-02-23T16:00:00.000Z", "completedAt": "2026-02-23T16:03:00.000Z", "status": "complete" }, + { "phase": "pm-questions-answer", "startedAt": "2026-02-23T16:03:00.000Z", "status": "awaiting-user" } + ], + + "settings": { + "question_depth": "standard", + "document_length": "standard" + }, + + "icon": { + "type": "icon", + "value": "search", + "color": "hsl(30 85% 55%)" + }, + + "costTracking": { + "tier": "standard", + "phases": [ + { "phase": "pm-questions-generate", "phaseName": "PM Questions", "inputTokens": 1480, "outputTokens": 920, "timestamp": "2026-02-23T16:03:00.000Z" } + ], + "lastUpdated": "2026-02-23T16:03:00.000Z" + }, + + "approvedDocuments": [], + + "createdAt": "2026-02-23T16:00:00.000Z", + "lastUpdatedAt": "2026-02-23T16:03:00.000Z" +} diff --git a/fixtures/demo/outputs/projects/004-smart-log-analyzer/questions/engineer_questions.json b/fixtures/demo/outputs/projects/004-smart-log-analyzer/questions/engineer_questions.json new file mode 100644 index 0000000..a2251ed --- /dev/null +++ b/fixtures/demo/outputs/projects/004-smart-log-analyzer/questions/engineer_questions.json @@ -0,0 +1,4 @@ +{ + "project_request": "AI log analysis tool that detects anomalies, clusters errors, and suggests fixes.", + "questions": [] +} diff --git a/fixtures/demo/outputs/projects/004-smart-log-analyzer/questions/pm_questions.json b/fixtures/demo/outputs/projects/004-smart-log-analyzer/questions/pm_questions.json new file mode 100644 index 0000000..5fe38c5 --- /dev/null +++ b/fixtures/demo/outputs/projects/004-smart-log-analyzer/questions/pm_questions.json @@ -0,0 +1,21 @@ +{ + "project_request": "AI log analysis tool that detects anomalies, clusters errors, and suggests fixes.", + "questions": [ + { + "question": "What log formats should the analyzer support out of the box?", + "options": ["JSON structured logs only", "JSON + plain text", "JSON + plain text + common formats (Apache, Nginx, Syslog)"] + }, + { + "question": "How should the AI handle log volume — should it analyze all logs or sample?", + "options": ["Analyze all logs (up to 100MB)", "Smart sampling with full analysis on anomalies", "User-configurable: full or sampled"] + }, + { + "question": "Should the tool support real-time streaming analysis or batch-only?", + "options": ["Batch upload only (simpler)", "Real-time streaming via WebSocket", "Both batch and streaming"] + }, + { + "question": "What level of root cause analysis is expected?", + "options": ["Pattern detection only — group similar errors", "Pattern detection + probable root cause suggestion", "Full root cause analysis with suggested code fixes"] + } + ] +} diff --git a/fixtures/demo/outputs/projects/004-smart-log-analyzer/questions/ux_questions.json b/fixtures/demo/outputs/projects/004-smart-log-analyzer/questions/ux_questions.json new file mode 100644 index 0000000..a2251ed --- /dev/null +++ b/fixtures/demo/outputs/projects/004-smart-log-analyzer/questions/ux_questions.json @@ -0,0 +1,4 @@ +{ + "project_request": "AI log analysis tool that detects anomalies, clusters errors, and suggests fixes.", + "questions": [] +} diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/acceptance_criteria.json b/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/acceptance_criteria.json new file mode 100644 index 0000000..01dc1b5 --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/acceptance_criteria.json @@ -0,0 +1,44 @@ +{ + "project_id": "005-model-performance-dashboard", + "project_name": "Model Performance Dashboard", + "job_stories": [ + { + "job_story_id": "js_001", + "title": "View Real-Time Model Metrics", + "situation": "I have AI models running in production", + "motivation": "monitor their health and catch issues before users do", + "outcome": "I can see live performance data and respond proactively", + "acceptance_criteria": [ + { "id": "ac_001_01", "given": "I open the dashboard", "when": "The overview page loads", "then": "I see latency (p50/p95/p99), error rate, and request count for each model" }, + { "id": "ac_001_02", "given": "I am viewing a latency chart", "when": "I change the time range to '24h'", "then": "The chart updates to show the last 24 hours of data" }, + { "id": "ac_001_03", "given": "I want to compare two models", "when": "I select both models in the comparison view", "then": "Their latency and error charts overlay on the same axes" }, + { "id": "ac_001_04", "given": "New metric data arrives", "when": "The auto-refresh triggers (every 30s)", "then": "Charts update smoothly without full page reload" } + ] + }, + { + "job_story_id": "js_002", + "title": "Configure Performance Alerts", + "situation": "I want to be notified when a model's latency exceeds acceptable levels", + "motivation": "respond to issues before they affect end users", + "outcome": "I get timely Slack/email notifications when thresholds are breached", + "acceptance_criteria": [ + { "id": "ac_002_01", "given": "I am on the Alerts page", "when": "I click 'New Alert'", "then": "A form appears with metric selector, threshold input, comparison operator, and channel picker" }, + { "id": "ac_002_02", "given": "I have set a p95 latency threshold of 2000ms", "when": "I preview the alert", "then": "A horizontal line at 2000ms appears on the current latency chart" }, + { "id": "ac_002_03", "given": "An alert is configured", "when": "The metric breaches the threshold", "then": "A notification is sent to the configured Slack channel within 60 seconds" }, + { "id": "ac_002_04", "given": "An alert has been triggered", "when": "I view alert history", "then": "I see the trigger timestamp, metric value, and threshold that was breached" } + ] + }, + { + "job_story_id": "js_003", + "title": "Analyze Cost and Budget", + "situation": "I manage the AI infrastructure budget", + "motivation": "understand spending patterns and prevent cost overruns", + "outcome": "I can track costs by model and get alerted before exceeding budget", + "acceptance_criteria": [ + { "id": "ac_003_01", "given": "I open the Cost tab", "when": "The page loads", "then": "I see a breakdown of spending by model with daily and monthly totals" }, + { "id": "ac_003_02", "given": "I am viewing cost data", "when": "I select a specific model", "then": "I see cost-per-request, total tokens, and spend trend for that model" }, + { "id": "ac_003_03", "given": "I have set a monthly budget of $5,000", "when": "Projected spend exceeds the budget", "then": "A budget alert is sent and the dashboard shows a warning banner" } + ] + } + ] +} diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/design_brief.md b/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/design_brief.md new file mode 100644 index 0000000..a0b35e1 --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/design_brief.md @@ -0,0 +1,61 @@ +# Design Brief: Model Performance Dashboard + +## Design Goals + +1. **Glanceable** - Key health indicators visible without scrolling +2. **Data-dense** - Pack information without feeling cluttered +3. **Alert-oriented** - Problems surface immediately through color and position + +## User Flows + +### Flow 1: Morning Health Check + +``` +Open dashboard → Overview tab + | +Scan status cards: green = healthy, yellow = warning, red = alert + | +Click a yellow card to drill into that model's metrics + | +View latency chart, identify spike at 3 AM + | +Check alert history — was notified via Slack +``` + +### Flow 2: Set Up Alert + +``` +Click "Alerts" tab → "New Alert" + | +Select: p95 latency > 2000ms for "claude-3-opus" + | +Preview shows threshold line on live chart + | +Select channel: #ml-alerts Slack channel + | +Save → alert is active immediately +``` + +## Key Screens + +1. **Overview** - Grid of model health cards with sparklines +2. **Model Detail** - Full charts for latency, errors, cost +3. **Alert Config** - Form with live chart preview +4. **Cost Analysis** - Spend breakdown with budget tracking + +## Visual Guidelines + +- Chart colors: consistent per model across all views +- Status: Green (healthy), Yellow (warning), Red (critical) +- Data labels: show exact values on hover +- Responsive: charts stack vertically on mobile +- Dark mode optimized (engineers often use dark mode) + +## Accessibility + +- Color-blind safe palette for status indicators (use shapes + color) +- Chart data available as tables for screen readers +- Keyboard navigation between models and time ranges + +--- +*Document generated as part of SpecWright specification* diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/prd.md b/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/prd.md new file mode 100644 index 0000000..720c4d1 --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/prd.md @@ -0,0 +1,72 @@ +# Product Requirements Document: Model Performance Dashboard + +## Overview + +A real-time monitoring dashboard for AI models in production. Engineers can track latency percentiles, token costs, error rates, and quality scores across all deployed models. Configurable alerts notify via Slack or email when metrics breach thresholds. + +## Problem Statement + +AI teams deploy models but lack visibility into production performance. They discover issues from user complaints, not metrics. Cost overruns go unnoticed until the monthly bill. There is no centralized place to compare model performance, track degradation, and set proactive alerts. + +## Goals + +1. **Real-time visibility** - See latency, cost, and errors updating live +2. **Proactive alerting** - Get notified before users notice degradation +3. **Cost control** - Track spend per model and endpoint with budget alerts +4. **Model comparison** - Compare performance across models and versions + +## User Stories + +### US-1: View Real-Time Model Metrics +**As an** ML engineer, **I want to** see live latency and error rates for my deployed models, **so that** I can detect issues immediately. + +**Acceptance Criteria:** +- Dashboard shows p50, p95, p99 latency charts per model +- Error rate chart with breakdown by error type +- Time range selector: 1h, 6h, 24h, 7d, 30d +- Auto-refresh every 30 seconds + +### US-2: Configure and Manage Alerts +**As an** ML engineer, **I want to** set threshold-based alerts on key metrics, **so that** I'm notified before performance degrades to user-impacting levels. + +**Acceptance Criteria:** +- Create alert: select metric, threshold, comparison (above/below), channel +- Live preview shows threshold line on the current chart +- Alert history shows recent triggers with timestamps +- Mute/unmute alerts without deleting them + +### US-3: Analyze Costs and Budget +**As an** engineering manager, **I want to** track AI spending per model and endpoint, **so that** I can manage budget and optimize costs. + +**Acceptance Criteria:** +- Cost breakdown by model, endpoint, and time period +- Daily and monthly spend charts with trend lines +- Budget alert: notify when projected monthly cost exceeds limit +- Cost-per-request comparison across models + +## Scope + +### In Scope +- Real-time metrics dashboard (latency, errors, cost, quality) +- TimescaleDB storage with 90-day retention +- Alert configuration with Slack and email delivery +- Multi-model comparison charts +- Mobile-responsive layout + +### Out of Scope +- Model deployment/management +- A/B testing infrastructure +- Log aggregation (use existing tools) +- Custom metric definition (v2) + +## Success Metrics + +| Metric | Target | +|--------|--------| +| Time to detect issue | <5 minutes (vs hours today) | +| Alert accuracy | >90% actionable (low false positives) | +| Dashboard load time | <2 seconds | +| Daily active users | 80%+ of ML team | + +--- +*Document generated as part of SpecWright specification* diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/screens.json b/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/screens.json new file mode 100644 index 0000000..b0c5a77 --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/screens.json @@ -0,0 +1,269 @@ +{ + "project_id": "005-model-performance-dashboard", + "project_name": "Model Performance Dashboard", + "screens": [ + { + "id": "metrics-overview", + "name": "Dashboard Overview", + "route": "/dashboard", + "description": "Overview page showing health status cards for each model with key metrics and sparklines", + "wireframe": { + "type": "stack", + "direction": "vertical", + "children": [ + { + "type": "nav", + "text": "NovaMind Model Dashboard", + "children": [ + { "type": "tabs", "tabs": ["Overview", "Latency", "Cost", "Alerts"], "activeTab": "Overview" }, + { "type": "spacer", "flex": 1 }, + { "type": "select", "placeholder": "Last 24h", "options": ["Last 1h", "Last 6h", "Last 24h", "Last 7d", "Last 30d"] }, + { "type": "avatar", "size": "sm", "text": "JB" } + ] + }, + { + "type": "stack", + "padding": "lg", + "gap": "lg", + "children": [ + { + "type": "row", + "gap": "md", + "children": [ + { + "type": "card", + "padding": "md", + "children": [ + { + "type": "stack", + "gap": "xs", + "children": [ + { "type": "text", "text": "Total Requests", "size": "xs", "color": "muted" }, + { "type": "heading", "text": "142,847", "size": "lg" }, + { "type": "text", "text": "+12.4% from yesterday", "size": "xs", "color": "accent" } + ] + } + ] + }, + { + "type": "card", + "padding": "md", + "children": [ + { + "type": "stack", + "gap": "xs", + "children": [ + { "type": "text", "text": "Avg Latency (p95)", "size": "xs", "color": "muted" }, + { "type": "heading", "text": "1,240ms", "size": "lg" }, + { "type": "text", "text": "-8.2% from yesterday", "size": "xs", "color": "accent" } + ] + } + ] + }, + { + "type": "card", + "padding": "md", + "children": [ + { + "type": "stack", + "gap": "xs", + "children": [ + { "type": "text", "text": "Error Rate", "size": "xs", "color": "muted" }, + { "type": "heading", "text": "0.34%", "size": "lg" }, + { "type": "text", "text": "Within threshold", "size": "xs", "color": "accent" } + ] + } + ] + }, + { + "type": "card", + "padding": "md", + "children": [ + { + "type": "stack", + "gap": "xs", + "children": [ + { "type": "text", "text": "Today's Cost", "size": "xs", "color": "muted" }, + { "type": "heading", "text": "$127.40", "size": "lg" }, + { "type": "text", "text": "$3,822 projected monthly", "size": "xs", "color": "muted" } + ] + } + ] + } + ] + }, + { + "type": "row", + "gap": "md", + "children": [ + { + "type": "card", + "padding": "md", + "children": [ + { + "type": "stack", + "gap": "sm", + "children": [ + { + "type": "row", + "justify": "between", + "children": [ + { "type": "heading", "text": "claude-3-opus", "size": "sm" }, + { "type": "badge", "text": "Healthy", "variant": "default" } + ] + }, + { "type": "text", "text": "p95: 980ms | Errors: 0.1% | $84/day", "size": "xs", "color": "muted" } + ] + } + ] + }, + { + "type": "card", + "padding": "md", + "children": [ + { + "type": "stack", + "gap": "sm", + "children": [ + { + "type": "row", + "justify": "between", + "children": [ + { "type": "heading", "text": "gpt-4-turbo", "size": "sm" }, + { "type": "badge", "text": "Warning", "variant": "warning" } + ] + }, + { "type": "text", "text": "p95: 1,840ms | Errors: 0.8% | $43/day", "size": "xs", "color": "muted" } + ] + } + ] + } + ] + } + ] + } + ] + }, + "notes": "Status cards use green/yellow/red borders. Sparklines embedded in model cards. Click a model card to see full detail view.", + "components_to_reuse": [ + { "name": "Card", "path": "components/ui/Card.tsx" }, + { "name": "Badge", "path": "components/ui/Badge.tsx" } + ], + "components_to_create": [ + { "name": "MetricCard", "path": "components/dashboard/MetricCard.tsx", "description": "Summary card with metric value, label, and trend indicator" }, + { "name": "ModelHealthCard", "path": "components/dashboard/ModelHealthCard.tsx", "description": "Model status card with sparkline and key metrics" }, + { "name": "Sparkline", "path": "components/dashboard/Sparkline.tsx", "description": "Tiny inline SVG chart for trend visualization" } + ] + }, + { + "id": "alert-configuration", + "name": "Alert Configuration", + "route": "/dashboard/alerts", + "description": "Alert management page with create form, active alerts list, and trigger history", + "wireframe": { + "type": "stack", + "direction": "vertical", + "children": [ + { + "type": "nav", + "text": "NovaMind Model Dashboard", + "children": [ + { "type": "tabs", "tabs": ["Overview", "Latency", "Cost", "Alerts"], "activeTab": "Alerts" }, + { "type": "spacer", "flex": 1 }, + { "type": "avatar", "size": "sm", "text": "JB" } + ] + }, + { + "type": "stack", + "padding": "lg", + "gap": "lg", + "children": [ + { + "type": "row", + "justify": "between", + "children": [ + { "type": "heading", "text": "Alert Rules", "size": "lg" }, + { "type": "button", "text": "+ New Alert", "variant": "primary" } + ] + }, + { + "type": "card", + "padding": "none", + "children": [ + { + "type": "table", + "headers": ["Model", "Metric", "Condition", "Channel", "Status", "Actions"], + "rows": [ + { + "cells": [ + { "type": "text", "text": "claude-3-opus", "size": "sm" }, + { "type": "text", "text": "p95 latency", "size": "sm" }, + { "type": "badge", "text": "> 2,000ms" }, + { "type": "badge", "text": "#ml-alerts" }, + { "type": "badge", "text": "Active", "variant": "default" }, + { "type": "button", "text": "Edit", "variant": "ghost" } + ] + }, + { + "cells": [ + { "type": "text", "text": "gpt-4-turbo", "size": "sm" }, + { "type": "text", "text": "Error rate", "size": "sm" }, + { "type": "badge", "text": "> 1%" }, + { "type": "badge", "text": "email" }, + { "type": "badge", "text": "Triggered", "variant": "destructive" }, + { "type": "button", "text": "Edit", "variant": "ghost" } + ] + }, + { + "cells": [ + { "type": "text", "text": "All models", "size": "sm" }, + { "type": "text", "text": "Monthly cost", "size": "sm" }, + { "type": "badge", "text": "> $5,000" }, + { "type": "badge", "text": "#ml-alerts" }, + { "type": "badge", "text": "Active", "variant": "default" }, + { "type": "button", "text": "Edit", "variant": "ghost" } + ] + } + ] + } + ] + } + ] + } + ] + }, + "notes": "Triggered alerts shown in red. Muted alerts grayed out. Click 'New Alert' opens modal with live chart preview.", + "components_to_reuse": [ + { "name": "Table", "path": "components/ui/Table.tsx" }, + { "name": "Button", "path": "components/ui/Button.tsx" }, + { "name": "Badge", "path": "components/ui/Badge.tsx" } + ], + "components_to_create": [ + { "name": "AlertRuleForm", "path": "components/dashboard/AlertRuleForm.tsx", "description": "Form with metric selector, threshold, comparison, and channel picker" }, + { "name": "ThresholdPreview", "path": "components/dashboard/ThresholdPreview.tsx", "description": "Chart overlay showing threshold line on live metric data" } + ] + } + ], + "user_flows": [ + { + "id": "flow_morning_check", + "name": "Morning Health Check", + "steps": [ + { "step": 1, "screen": "metrics-overview", "action": "Engineer opens dashboard, scans status cards" }, + { "step": 2, "screen": "metrics-overview", "action": "Sees gpt-4-turbo has 'Warning' status" }, + { "step": 3, "screen": "metrics-overview", "action": "Clicks model card to view detailed latency chart" }, + { "step": 4, "screen": "metrics-overview", "action": "Identifies latency spike at 3 AM, correlates with deployment" } + ] + }, + { + "id": "flow_create_alert", + "name": "Create a Latency Alert", + "steps": [ + { "step": 1, "screen": "alert-configuration", "action": "Clicks '+ New Alert' button" }, + { "step": 2, "screen": "alert-configuration", "action": "Selects claude-3-opus, p95 latency > 2000ms" }, + { "step": 3, "screen": "alert-configuration", "action": "Preview shows threshold line on current chart" }, + { "step": 4, "screen": "alert-configuration", "action": "Selects #ml-alerts Slack channel, saves alert" } + ] + } + ] +} diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/technical_specification.md b/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/technical_specification.md new file mode 100644 index 0000000..61ad388 --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/technical_specification.md @@ -0,0 +1,89 @@ +# Technical Specification: Model Performance Dashboard + +## Architecture Overview + +Metrics flow from applications via HTTP to an ingestion API, through Redis for burst buffering, into TimescaleDB for storage. The dashboard reads from pre-computed continuous aggregates for fast queries. + +``` +┌──────────────┐ ┌──────────────┐ ┌─────────────┐ +│ Application │────▶│ Ingestion │────▶│ Redis │ +│ (SDK/HTTP) │ │ API (Express)│ │ (Queue) │ +└──────────────┘ └──────────────┘ └──────┬──────┘ + │ + ┌──────────────┐ ┌──────▼──────┐ + │ React SPA │◀────│ TimescaleDB │ + │ (Dashboard) │ │ (Metrics) │ + └──────────────┘ └──────┬──────┘ + │ + ┌──────────────┐ ┌──────▼──────┐ + │ Slack/Email │◀────│ Alert │ + │ (Channels) │ │ Evaluator │ + └──────────────┘ └─────────────┘ +``` + +## Ingestion API + +### POST /api/metrics/ingest +Accepts batched metric data points from application SDKs. + +**Request:** +```json +{ + "model": "claude-3-opus", + "endpoint": "/api/chat", + "metrics": { + "latency_ms": 1240, + "input_tokens": 450, + "output_tokens": 120, + "status": "success", + "cost_usd": 0.0089 + }, + "timestamp": "2026-02-24T10:30:00.000Z" +} +``` + +## Storage Schema + +### Hypertable: `model_metrics` +TimescaleDB hypertable partitioned by time. + +Columns: timestamp, model, endpoint, latency_ms, input_tokens, output_tokens, status, cost_usd, error_type + +### Continuous Aggregates +- `metrics_1min` — 1-minute rollups with p50, p95, p99, avg, count +- `metrics_1hour` — 1-hour rollups for longer time ranges +- `metrics_1day` — Daily rollups for monthly views + +## Alert System + +Alert rules stored in PostgreSQL. Evaluation runs on two tracks: +- **Critical alerts**: evaluated on each incoming data point via Redis pub/sub +- **Standard alerts**: evaluated every 60 seconds via polling worker + +Alert delivery via webhook to Slack API and SendGrid for email. + +## Dashboard API + +### GET /api/dashboard/overview +Returns summary metrics for all models (current values + trends). + +### GET /api/dashboard/metrics/:model +Returns time-series data for a specific model with configurable time range and granularity. + +### CRUD /api/dashboard/alerts +Create, read, update, delete alert rules. + +### GET /api/dashboard/alerts/history +Returns recent alert triggers with details. + +## Performance Targets + +| Query | Target | Strategy | +|-------|--------|----------| +| Overview load | <1s | Continuous aggregates + caching | +| Model detail (24h) | <2s | 1-min aggregates, ~1440 points | +| Model detail (30d) | <2s | Hourly aggregates, ~720 points | +| Alert evaluation | <5s | Redis pub/sub for critical path | + +--- +*Document generated as part of SpecWright specification* diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/technology_choices.json b/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/technology_choices.json new file mode 100644 index 0000000..ea4149b --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/documents/technology_choices.json @@ -0,0 +1,150 @@ +{ + "project_name": "Model Performance Dashboard", + "technology_decisions": [ + { + "category": "Time-Series Database", + "description": "Database for storing and querying model performance metrics over time", + "decision_needed": true, + "options": [ + { + "name": "TimescaleDB", + "description": "PostgreSQL extension for time-series data with continuous aggregates", + "version": "2.14", + "documentation_url": "https://docs.timescale.com/", + "github_url": "https://github.com/timescale/timescaledb", + "pros": ["Built on PostgreSQL (familiar)", "Continuous aggregates for fast queries", "SQL-native", "Compression for storage efficiency"], + "cons": ["Heavier than lightweight alternatives", "Requires PostgreSQL hosting"], + "trade_offs": ["More powerful but heavier infrastructure"], + "maturity": "Production-ready", + "community_size": "Large", + "last_updated": "2024-12", + "implementation_complexity": "Low", + "estimated_time": "4 hours", + "recommended": true, + "recommendation_reason": "We already use PostgreSQL — TimescaleDB adds time-series capabilities with minimal overhead" + }, + { + "name": "InfluxDB", + "description": "Purpose-built time-series database with Flux query language", + "version": "3.0", + "documentation_url": "https://docs.influxdata.com/", + "github_url": "https://github.com/influxdata/influxdb", + "pros": ["Purpose-built for metrics", "Built-in downsampling", "Good visualization tools"], + "cons": ["New query language (Flux)", "Separate database to manage", "Different backup/monitoring"], + "trade_offs": ["Better for pure metrics but adds operational complexity"], + "maturity": "Production-ready", + "community_size": "Large", + "last_updated": "2024-11", + "implementation_complexity": "Medium", + "estimated_time": "8 hours", + "recommended": false, + "recommendation_reason": "" + } + ], + "user_choice": "TimescaleDB", + "user_choice_description": "PostgreSQL extension for time-series data", + "user_choice_version": "2.14", + "user_reason": "Minimal new infrastructure — extends our existing PostgreSQL", + "final_decision": "timescaledb" + }, + { + "category": "Charting Library", + "description": "Frontend library for rendering real-time metric charts", + "decision_needed": true, + "options": [ + { + "name": "Recharts", + "description": "React-native charting library built on D3", + "version": "2.12", + "documentation_url": "https://recharts.org/", + "github_url": "https://github.com/recharts/recharts", + "pros": ["React-native components", "Good default styling", "Responsive out of the box", "Active community"], + "cons": ["Performance degrades with >10K data points", "Limited customization for complex charts"], + "trade_offs": ["Easy to use but limited for advanced visualizations"], + "maturity": "Production-ready", + "community_size": "Very large", + "last_updated": "2024-10", + "implementation_complexity": "Low", + "estimated_time": "3 hours", + "recommended": true, + "recommendation_reason": "Best balance of ease-of-use and capability for dashboard charts" + }, + { + "name": "Apache ECharts", + "description": "Powerful visualization library with extensive chart types", + "version": "5.5", + "documentation_url": "https://echarts.apache.org/", + "github_url": "https://github.com/apache/echarts", + "pros": ["Handles large datasets efficiently", "Extensive chart types", "Built-in animations"], + "cons": ["Not React-native (wrapper needed)", "Steeper learning curve", "Large bundle"], + "trade_offs": ["More powerful but less React-friendly"], + "maturity": "Production-ready", + "community_size": "Very large", + "last_updated": "2024-11", + "implementation_complexity": "Medium", + "estimated_time": "6 hours", + "recommended": false, + "recommendation_reason": "" + } + ], + "user_choice": "Recharts", + "user_choice_description": "React-native charting library built on D3", + "user_choice_version": "2.12", + "user_reason": "React-native, good defaults, our data points per view stay under 2K", + "final_decision": "recharts" + }, + { + "category": "Alert Delivery", + "description": "Service for sending alert notifications via Slack and email", + "decision_needed": true, + "options": [ + { + "name": "Direct API Integration", + "description": "Direct calls to Slack API and SendGrid for email delivery", + "version": "N/A", + "documentation_url": "", + "github_url": "", + "pros": ["No extra dependency", "Full control over message formatting", "Simple for 2 channels"], + "cons": ["Need to handle retries ourselves", "More code for each new channel"], + "trade_offs": ["Simple now but harder to scale to many channels"], + "maturity": "N/A", + "community_size": "N/A", + "last_updated": "N/A", + "implementation_complexity": "Low", + "estimated_time": "4 hours", + "recommended": true, + "recommendation_reason": "We only need Slack and email — a full notification service is overkill" + }, + { + "name": "Novu", + "description": "Open-source notification infrastructure with multi-channel support", + "version": "0.24", + "documentation_url": "https://docs.novu.co/", + "github_url": "https://github.com/novuhq/novu", + "pros": ["Multi-channel out of the box", "Template management", "Delivery tracking"], + "cons": ["Extra service to deploy", "Overkill for 2 channels", "Learning curve"], + "trade_offs": ["More capable but unnecessary complexity for our needs"], + "maturity": "Stable", + "community_size": "Medium", + "last_updated": "2024-11", + "implementation_complexity": "Medium", + "estimated_time": "6 hours", + "recommended": false, + "recommendation_reason": "" + } + ], + "user_choice": "Direct API Integration", + "user_choice_description": "Direct calls to Slack API and SendGrid", + "user_choice_version": "N/A", + "user_reason": "Keep it simple — Slack webhook + SendGrid is all we need", + "final_decision": "direct-api-integration" + } + ], + "summary": { + "total_decisions": 3, + "decisions_made": 3, + "estimated_setup_time": "11 hours", + "estimated_learning_curve": "Low — all chosen technologies are familiar or well-documented", + "overall_complexity": "Medium" + } +} diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-001.md b/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-001.md new file mode 100644 index 0000000..eb1b53e --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-001.md @@ -0,0 +1,48 @@ +# Build real-time metrics ingestion pipeline + +**Issue ID**: ENG-001 +**Status**: pending +**Category**: backend +**Complexity Score**: 7/10 +**Estimated Hours**: 6h +**Complexity Reasoning**: TimescaleDB setup with continuous aggregates, Redis buffering, and high-throughput ingestion requires careful design for data integrity. + +## Description + +Build the metrics ingestion pipeline: HTTP endpoint accepting batched metrics from application SDKs, Redis queue for burst buffering, and TimescaleDB hypertable with continuous aggregates for fast dashboard queries. + +## Technical Details + +- POST /api/metrics/ingest endpoint accepting batched metric payloads +- Redis list as buffer queue for burst handling +- Worker process consuming from Redis and batch-inserting into TimescaleDB +- TimescaleDB hypertable partitioned by time (1-day chunks) +- Continuous aggregates: 1-minute, 1-hour, 1-day rollups +- 90-day retention policy with automatic chunk dropping + +## Dependencies + +- TimescaleDB extension enabled on PostgreSQL +- Redis instance for queue buffering + +## Acceptance Criteria + +- [ ] Ingestion endpoint accepts metric payloads and returns 202 Accepted +- [ ] Metrics buffered in Redis and flushed to TimescaleDB within 5 seconds +- [ ] Continuous aggregates compute p50, p95, p99, avg, count, sum +- [ ] Handles 1,000 metrics/second without data loss +- [ ] Data older than 90 days automatically removed + +## Test Strategy + +### Automated Tests + +- Unit test: payload validation and normalization +- Integration test: ingest → Redis → TimescaleDB round-trip +- Load test: 1K requests/sec sustained for 60 seconds + +### Manual Verification (Human-in-the-Loop) + +1. Send metric via curl with valid payload — verify 202 response +2. Query TimescaleDB — verify metric appears within 5 seconds +3. Wait 1 minute — verify 1-minute aggregate computed correctly diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-002.md b/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-002.md new file mode 100644 index 0000000..b909745 --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-002.md @@ -0,0 +1,49 @@ +# Create dashboard overview with metric cards and charts + +**Issue ID**: ENG-002 +**Status**: pending +**Category**: frontend +**Complexity Score**: 5/10 +**Estimated Hours**: 5h +**Complexity Reasoning**: Standard React components with Recharts. Main complexity is real-time updates and responsive layout. + +## Description + +Build the dashboard overview page with summary metric cards (requests, latency, errors, cost), model health cards with sparklines, time range selector, and auto-refresh every 30 seconds. + +## Technical Details + +- Overview API endpoint returning aggregated metrics for all models +- MetricCard component: value, label, trend arrow, percentage change +- ModelHealthCard: name, status badge, key metrics, sparkline chart +- Time range selector: 1h, 6h, 24h, 7d, 30d +- Auto-refresh via setInterval, smooth chart transitions on data update +- Recharts LineChart for sparklines, AreaChart for detail views + +## Dependencies + +- ENG-001 (metrics data available in TimescaleDB) + +## Acceptance Criteria + +- [ ] Overview page shows 4 summary metric cards +- [ ] Model health cards display for each active model +- [ ] Time range selector updates all charts and metrics +- [ ] Auto-refresh updates data every 30 seconds without flicker +- [ ] Charts are responsive and stack on mobile +- [ ] Click on model card navigates to detailed view + +## Test Strategy + +### Automated Tests + +- Component test: metric cards render with mock data +- Component test: time range selector triggers data refetch +- Component test: sparkline renders from array of values + +### Manual Verification (Human-in-the-Loop) + +1. Open dashboard — verify all metric cards show current values +2. Change time range to 7d — verify charts update +3. Wait 30 seconds — verify data refreshes +4. Resize browser to mobile width — verify responsive layout diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-003.md b/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-003.md new file mode 100644 index 0000000..7949030 --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-003.md @@ -0,0 +1,49 @@ +# Implement alert configuration and threshold management + +**Issue ID**: ENG-003 +**Status**: pending +**Category**: fullstack +**Complexity Score**: 5/10 +**Estimated Hours**: 4h +**Complexity Reasoning**: Standard CRUD with a form UI. The threshold preview overlay on live charts adds moderate complexity. + +## Description + +Build the alert management UI: create/edit/delete alert rules with metric selection, threshold value, comparison operator, and notification channel. Include a live preview showing the threshold line overlaid on the current metric chart. + +## Technical Details + +- Alert rules table in PostgreSQL: id, model, metric, operator, threshold, channel, enabled, created_at +- CRUD API endpoints for alert rules +- AlertRuleForm component with metric dropdown, threshold input, channel picker +- ThresholdPreview: horizontal line overlay on Recharts chart +- Alert list page with status badges (Active, Triggered, Muted) +- Mute/unmute toggle without deleting the rule + +## Dependencies + +- ENG-001 (metrics data for threshold preview) + +## Acceptance Criteria + +- [ ] "New Alert" form has metric selector, threshold, operator, and channel fields +- [ ] Threshold preview shows horizontal line on the current chart +- [ ] Saving creates alert rule in database +- [ ] Alert list shows all rules with status +- [ ] Edit updates and Delete removes alert rules +- [ ] Mute toggle disables evaluation without deleting + +## Test Strategy + +### Automated Tests + +- Integration test: CRUD operations on alert rules +- Component test: form validation, threshold preview rendering +- Component test: mute toggle updates state + +### Manual Verification (Human-in-the-Loop) + +1. Create a new alert for p95 latency > 2000ms on Claude +2. Verify threshold line appears on chart preview +3. Save and verify it appears in the alert list +4. Mute the alert and verify status changes diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-004.md b/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-004.md new file mode 100644 index 0000000..876113f --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-004.md @@ -0,0 +1,49 @@ +# Add cost analysis and budget tracking views + +**Issue ID**: ENG-004 +**Status**: pending +**Category**: fullstack +**Complexity Score**: 5/10 +**Estimated Hours**: 4h +**Complexity Reasoning**: Cost aggregation queries are straightforward with continuous aggregates. Chart components reuse patterns from ENG-002. + +## Description + +Build the Cost tab with spend breakdown by model, daily and monthly charts with trend lines, budget alert configuration, and cost-per-request comparison across models. + +## Technical Details + +- Cost aggregation queries from TimescaleDB continuous aggregates +- Cost breakdown table: model, daily cost, monthly cost, cost-per-request +- Recharts AreaChart for daily spend with trend line overlay +- Budget configuration: monthly limit with projected spend calculation +- Budget alert: triggers when projected monthly spend exceeds limit +- Model comparison: bar chart of cost-per-request across models + +## Dependencies + +- ENG-001 (metrics data with cost_usd field) +- ENG-002 (dashboard layout and chart patterns) + +## Acceptance Criteria + +- [ ] Cost tab shows breakdown by model with daily and monthly totals +- [ ] Daily spend chart shows last 30 days with trend line +- [ ] Monthly projected spend calculated from current daily rate +- [ ] Budget alert configurable with monthly limit +- [ ] Cost-per-request comparison bar chart shows all models +- [ ] Click on model row shows detailed cost breakdown + +## Test Strategy + +### Automated Tests + +- Integration test: cost aggregation returns correct totals +- Unit test: projected spend calculation from daily data +- Component test: charts render, budget warning appears at threshold + +### Manual Verification (Human-in-the-Loop) + +1. Open Cost tab — verify breakdown matches ingested data +2. Set budget to $100 — verify warning appears if projected exceeds it +3. Click on a model — verify detailed cost view loads diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-005.md b/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-005.md new file mode 100644 index 0000000..be9b3ab --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/ENG-005.md @@ -0,0 +1,51 @@ +# Set up Slack and email alert delivery + +**Issue ID**: ENG-005 +**Status**: pending +**Category**: backend +**Complexity Score**: 4/10 +**Estimated Hours**: 3h +**Complexity Reasoning**: Slack webhook and SendGrid APIs are well-documented. Main complexity is the hybrid evaluation approach. + +## Description + +Implement the alert evaluation engine with hybrid streaming/polling evaluation, and delivery via Slack webhooks and SendGrid email. Critical alerts evaluated on each incoming data point, standard alerts on 60-second polling interval. + +## Technical Details + +- Alert evaluator worker process +- Critical path: Redis pub/sub subscriber, evaluates on each metric event +- Standard path: 60-second polling, queries latest aggregate vs threshold +- Slack delivery: webhook POST with formatted message block +- Email delivery: SendGrid API with HTML template +- Alert history logging: trigger timestamp, metric value, threshold, delivery status +- Cooldown period: 5 minutes between re-triggers of same alert + +## Dependencies + +- ENG-003 (alert rules to evaluate) + +## Acceptance Criteria + +- [ ] Critical alerts evaluated within 5 seconds of metric ingestion +- [ ] Standard alerts evaluated every 60 seconds +- [ ] Slack message includes model name, metric, current value, and threshold +- [ ] Email includes same info with a link to the dashboard +- [ ] Alert history records each trigger with delivery status +- [ ] 5-minute cooldown prevents alert spam + +## Test Strategy + +### Automated Tests + +- Unit test: threshold comparison logic for all operators (>, <, >=, <=) +- Integration test: mock Slack/SendGrid, verify delivery called on breach +- Unit test: cooldown logic prevents re-trigger within window + +### Manual Verification (Human-in-the-Loop) + +1. Create alert with low threshold to guarantee trigger +2. Ingest metrics that breach the threshold +3. Verify Slack message received within 60 seconds +4. Verify email received with dashboard link +5. Verify alert appears in trigger history diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/issues.json b/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/issues.json new file mode 100644 index 0000000..821b657 --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/issues/issues.json @@ -0,0 +1,73 @@ +{ + "project_name": "Model Performance Dashboard", + "project_id": "005-model-performance-dashboard", + "total_estimated_hours": 22, + "issues_list": [ + { + "issue_id": "ENG-001", + "title": "Build real-time metrics ingestion pipeline", + "description": "HTTP ingestion endpoint with Redis buffering and TimescaleDB storage with continuous aggregates", + "status": "pending", + "estimated_hours": 6, + "dependencies": [], + "test_strategy": { + "automated_tests": "Integration test: ingest metrics, verify stored in TimescaleDB. Load test: 1K requests/sec", + "manual_verification": "Send metrics via curl, query TimescaleDB to verify storage and aggregation" + }, + "file_path": "issues/ENG-001.md" + }, + { + "issue_id": "ENG-002", + "title": "Create dashboard overview with metric cards and charts", + "description": "Overview page with summary cards, model health cards with sparklines, and time range selector", + "status": "pending", + "estimated_hours": 5, + "dependencies": ["ENG-001"], + "test_strategy": { + "automated_tests": "Component test: metric cards render, time range updates charts, auto-refresh triggers", + "manual_verification": "Open dashboard, verify live data displays, change time range, check auto-refresh" + }, + "file_path": "issues/ENG-002.md" + }, + { + "issue_id": "ENG-003", + "title": "Implement alert configuration and threshold management", + "description": "CRUD for alert rules with metric/threshold/channel selection and live chart preview of threshold line", + "status": "pending", + "estimated_hours": 4, + "dependencies": ["ENG-001"], + "test_strategy": { + "automated_tests": "Integration test: create, read, update, delete alert rules. Component test: threshold preview renders", + "manual_verification": "Create alert rule, verify threshold line appears on chart, edit and delete rule" + }, + "file_path": "issues/ENG-003.md" + }, + { + "issue_id": "ENG-004", + "title": "Add cost analysis and budget tracking views", + "description": "Cost tab with spend breakdown by model, daily/monthly charts, budget alerts, and cost-per-request comparison", + "status": "pending", + "estimated_hours": 4, + "dependencies": ["ENG-001", "ENG-002"], + "test_strategy": { + "automated_tests": "Integration test: cost aggregation queries return correct totals. Component test: charts render", + "manual_verification": "View cost breakdown, verify totals match, set budget alert, check projected spend" + }, + "file_path": "issues/ENG-004.md" + }, + { + "issue_id": "ENG-005", + "title": "Set up Slack and email alert delivery", + "description": "Alert evaluation worker with hybrid streaming/polling, Slack webhook delivery, and SendGrid email", + "status": "pending", + "estimated_hours": 3, + "dependencies": ["ENG-003"], + "test_strategy": { + "automated_tests": "Unit test: alert evaluation logic. Integration test: mock Slack/SendGrid delivery", + "manual_verification": "Trigger alert threshold breach, verify Slack message and email received within 60s" + }, + "file_path": "issues/ENG-005.md" + } + ], + "definition_of_done": "Each feature independently testable by a human, all acceptance criteria met" +} diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/project_request.md b/fixtures/demo/outputs/projects/005-model-performance-dashboard/project_request.md new file mode 100644 index 0000000..b71861b --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/project_request.md @@ -0,0 +1,16 @@ +# Project Request #005 + +## Project Name +Model Performance Dashboard + +## Description +A real-time monitoring dashboard for AI models in production. Tracks key metrics including response latency (p50/p95/p99), token usage and costs, error rates by model and endpoint, quality scores from user feedback, and model drift detection. Supports alerts via Slack and email when metrics cross configurable thresholds. + +## Dependencies +None + +## Testable Outcome +An engineer opens the dashboard, sees real-time latency and cost charts for their deployed models, sets an alert threshold for p95 latency, and receives a Slack notification when it's breached. + +--- +*Created: 2026-02-19T11:00:00.000Z* diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/project_status.json b/fixtures/demo/outputs/projects/005-model-performance-dashboard/project_status.json new file mode 100644 index 0000000..13b1313 --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/project_status.json @@ -0,0 +1,86 @@ +{ + "version": "1.0.0", + "projectId": "005-model-performance-dashboard", + "currentAgent": "complete", + "currentPhase": "complete", + + "agents": { + "pm": { + "status": "complete", + "currentPhase": null, + "completedAt": "2026-02-19T12:30:00.000Z", + "phases": { + "questions-generate": { "status": "complete", "startedAt": "2026-02-19T11:00:00.000Z", "completedAt": "2026-02-19T11:04:00.000Z" }, + "questions-answer": { "status": "complete", "startedAt": "2026-02-19T11:04:00.000Z", "completedAt": "2026-02-19T11:30:00.000Z" }, + "prd-generate": { "status": "complete", "startedAt": "2026-02-19T11:30:00.000Z", "completedAt": "2026-02-19T12:00:00.000Z" }, + "prd-review": { "status": "complete", "startedAt": "2026-02-19T12:00:00.000Z", "completedAt": "2026-02-19T12:30:00.000Z" } + } + }, + "ux": { + "status": "complete", + "currentPhase": null, + "completedAt": "2026-02-19T14:00:00.000Z", + "phases": { + "questions-generate": { "status": "complete", "startedAt": "2026-02-19T12:30:00.000Z", "completedAt": "2026-02-19T12:34:00.000Z" }, + "questions-answer": { "status": "complete", "startedAt": "2026-02-19T12:34:00.000Z", "completedAt": "2026-02-19T13:00:00.000Z" }, + "design-brief-generate": { "status": "complete", "startedAt": "2026-02-19T13:00:00.000Z", "completedAt": "2026-02-19T13:30:00.000Z" }, + "design-brief-review": { "status": "complete", "startedAt": "2026-02-19T13:30:00.000Z", "completedAt": "2026-02-19T14:00:00.000Z" } + } + }, + "engineer": { + "status": "complete", + "currentPhase": null, + "completedAt": "2026-02-19T15:30:00.000Z", + "phases": { + "questions-generate": { "status": "complete", "startedAt": "2026-02-19T14:00:00.000Z", "completedAt": "2026-02-19T14:04:00.000Z" }, + "questions-answer": { "status": "complete", "startedAt": "2026-02-19T14:04:00.000Z", "completedAt": "2026-02-19T14:30:00.000Z" }, + "spec-generate": { "status": "complete", "startedAt": "2026-02-19T14:30:00.000Z", "completedAt": "2026-02-19T15:00:00.000Z" }, + "spec-review": { "status": "complete", "startedAt": "2026-02-19T15:00:00.000Z", "completedAt": "2026-02-19T15:30:00.000Z" } + } + } + }, + + "history": [ + { "phase": "pm-questions-generate", "startedAt": "2026-02-19T11:00:00.000Z", "completedAt": "2026-02-19T11:04:00.000Z", "status": "complete" }, + { "phase": "pm-questions-answer", "startedAt": "2026-02-19T11:04:00.000Z", "completedAt": "2026-02-19T11:30:00.000Z", "status": "complete" }, + { "phase": "pm-prd-generate", "startedAt": "2026-02-19T11:30:00.000Z", "completedAt": "2026-02-19T12:00:00.000Z", "status": "complete" }, + { "phase": "pm-prd-review", "startedAt": "2026-02-19T12:00:00.000Z", "completedAt": "2026-02-19T12:30:00.000Z", "status": "complete" }, + { "phase": "ux-questions-generate", "startedAt": "2026-02-19T12:30:00.000Z", "completedAt": "2026-02-19T12:34:00.000Z", "status": "complete" }, + { "phase": "ux-questions-answer", "startedAt": "2026-02-19T12:34:00.000Z", "completedAt": "2026-02-19T13:00:00.000Z", "status": "complete" }, + { "phase": "ux-design-brief-generate", "startedAt": "2026-02-19T13:00:00.000Z", "completedAt": "2026-02-19T13:30:00.000Z", "status": "complete" }, + { "phase": "ux-design-brief-review", "startedAt": "2026-02-19T13:30:00.000Z", "completedAt": "2026-02-19T14:00:00.000Z", "status": "complete" }, + { "phase": "engineer-questions-generate", "startedAt": "2026-02-19T14:00:00.000Z", "completedAt": "2026-02-19T14:04:00.000Z", "status": "complete" }, + { "phase": "engineer-questions-answer", "startedAt": "2026-02-19T14:04:00.000Z", "completedAt": "2026-02-19T14:30:00.000Z", "status": "complete" }, + { "phase": "engineer-spec-generate", "startedAt": "2026-02-19T14:30:00.000Z", "completedAt": "2026-02-19T15:00:00.000Z", "status": "complete" }, + { "phase": "engineer-spec-review", "startedAt": "2026-02-19T15:00:00.000Z", "completedAt": "2026-02-19T15:30:00.000Z", "status": "complete" } + ], + + "settings": { + "question_depth": "standard", + "document_length": "brief" + }, + + "icon": { + "type": "icon", + "value": "bar-chart-3", + "color": "hsl(350 70% 55%)" + }, + + "costTracking": { + "tier": "standard", + "phases": [ + { "phase": "pm-questions-generate", "phaseName": "PM Questions", "inputTokens": 1150, "outputTokens": 780, "timestamp": "2026-02-19T11:04:00.000Z" }, + { "phase": "pm-prd-generate", "phaseName": "PRD Generation", "inputTokens": 3100, "outputTokens": 4200, "timestamp": "2026-02-19T12:00:00.000Z" }, + { "phase": "ux-questions-generate", "phaseName": "UX Questions", "inputTokens": 1900, "outputTokens": 620, "timestamp": "2026-02-19T12:34:00.000Z" }, + { "phase": "ux-design-brief-generate", "phaseName": "Design Brief", "inputTokens": 3800, "outputTokens": 5400, "timestamp": "2026-02-19T13:30:00.000Z" }, + { "phase": "engineer-questions-generate", "phaseName": "Engineer Questions", "inputTokens": 2800, "outputTokens": 840, "timestamp": "2026-02-19T14:04:00.000Z" }, + { "phase": "engineer-spec-generate", "phaseName": "Technical Spec", "inputTokens": 5200, "outputTokens": 7600, "timestamp": "2026-02-19T15:00:00.000Z" } + ], + "lastUpdated": "2026-02-19T15:30:00.000Z" + }, + + "approvedDocuments": ["prd", "acceptance-criteria", "design", "screens", "tech-spec", "technology-choices"], + + "createdAt": "2026-02-19T11:00:00.000Z", + "lastUpdatedAt": "2026-02-19T15:30:00.000Z" +} diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/questions/engineer_questions.json b/fixtures/demo/outputs/projects/005-model-performance-dashboard/questions/engineer_questions.json new file mode 100644 index 0000000..e3d5ec7 --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/questions/engineer_questions.json @@ -0,0 +1,25 @@ +{ + "project_request": "Real-time monitoring dashboard for AI model performance with latency, cost, and alerting.", + "questions": [ + { + "question": "What time-series database should we use for metrics storage?", + "options": ["TimescaleDB (PostgreSQL extension)", "InfluxDB", "ClickHouse"], + "answer": "TimescaleDB — we already use PostgreSQL and it supports time-series natively" + }, + { + "question": "How should metrics be ingested in real-time?", + "options": ["HTTP POST from application", "Message queue (Redis/Kafka)", "Both: HTTP for low volume, queue for high volume"], + "answer": "HTTP POST with batching client-side, backed by a Redis queue for burst handling" + }, + { + "question": "How should aggregation work for longer time ranges?", + "options": ["Compute on read from raw data", "Pre-computed rollups via cron", "Continuous aggregates (TimescaleDB feature)"], + "answer": "Continuous aggregates — TimescaleDB handles this natively with minimal config" + }, + { + "question": "How should alerts be evaluated?", + "options": ["Polling on interval", "Streaming evaluation on each data point", "Hybrid: streaming for critical, polling for others"], + "answer": "Hybrid: streaming evaluation for critical alerts, 1-minute polling for non-critical" + } + ] +} diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/questions/pm_questions.json b/fixtures/demo/outputs/projects/005-model-performance-dashboard/questions/pm_questions.json new file mode 100644 index 0000000..98bc395 --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/questions/pm_questions.json @@ -0,0 +1,25 @@ +{ + "project_request": "Real-time monitoring dashboard for AI model performance with latency, cost, and alerting.", + "questions": [ + { + "question": "What metrics are most critical to track?", + "options": ["Latency only", "Latency + cost", "Latency + cost + error rate + quality scores"], + "answer": "Latency (p50/p95/p99) + cost per request + error rate + user quality feedback" + }, + { + "question": "What alert channels should be supported?", + "options": ["Email only", "Slack only", "Slack + email + webhook"], + "answer": "Slack + email with webhook for custom integrations" + }, + { + "question": "How long should metric data be retained?", + "options": ["7 days", "30 days", "90 days with downsampling"], + "answer": "90 days with minute-level granularity for 7 days, hourly after that" + }, + { + "question": "Should the dashboard support comparing metrics across models?", + "options": ["Single model view only", "Side-by-side comparison of 2 models", "Flexible multi-model comparison"], + "answer": "Flexible multi-model comparison with overlay charts" + } + ] +} diff --git a/fixtures/demo/outputs/projects/005-model-performance-dashboard/questions/ux_questions.json b/fixtures/demo/outputs/projects/005-model-performance-dashboard/questions/ux_questions.json new file mode 100644 index 0000000..4f99ea6 --- /dev/null +++ b/fixtures/demo/outputs/projects/005-model-performance-dashboard/questions/ux_questions.json @@ -0,0 +1,25 @@ +{ + "project_request": "Real-time monitoring dashboard for AI model performance with latency, cost, and alerting.", + "questions": [ + { + "question": "What should the default dashboard layout look like?", + "options": ["Single page with all metrics", "Tab-based: Overview, Latency, Cost, Alerts", "Customizable widget grid"], + "answer": "Tab-based with Overview showing key metrics and tabs for deep-dives" + }, + { + "question": "What chart types should be used for time-series data?", + "options": ["Line charts only", "Line + area charts", "Line + area + heatmaps for error distribution"], + "answer": "Line + area charts with heatmaps for error distribution" + }, + { + "question": "How should alert configuration be presented?", + "options": ["Simple form: metric + threshold + channel", "Visual rule builder", "Form with preview of alert condition"], + "answer": "Simple form with a live preview showing where the threshold sits on the current chart" + }, + { + "question": "Should the dashboard be responsive for mobile?", + "options": ["Desktop only", "Responsive but simplified on mobile", "Full mobile support with dedicated mobile layout"], + "answer": "Responsive but simplified on mobile — charts stack vertically, alerts still actionable" + } + ] +} diff --git a/scripts/load-demo.sh b/scripts/load-demo.sh new file mode 100755 index 0000000..9d85373 --- /dev/null +++ b/scripts/load-demo.sh @@ -0,0 +1,126 @@ +#!/usr/bin/env bash +# +# load-demo.sh — Load demo fixtures into SpecWright outputs/ +# +# Usage: +# ./scripts/load-demo.sh # Load fixtures (backs up existing outputs/ first) +# ./scripts/load-demo.sh --clean # Remove existing outputs/ before loading (no backup) +# ./scripts/load-demo.sh --reset # Restore the backup created by a previous load +# + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +ROOT_DIR="$(cd "$SCRIPT_DIR/.." && pwd)" +FIXTURES_DIR="$ROOT_DIR/fixtures/demo/outputs" +OUTPUTS_DIR="$ROOT_DIR/outputs" +BACKUP_DIR="$ROOT_DIR/outputs.bak" + +# Colors +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +info() { echo -e "${BLUE}ℹ${NC} $*"; } +ok() { echo -e "${GREEN}✓${NC} $*"; } +warn() { echo -e "${YELLOW}⚠${NC} $*"; } +err() { echo -e "${RED}✗${NC} $*" >&2; } + +# ─── Reset mode: restore backup ────────────────────────────────────────────── +if [[ "${1:-}" == "--reset" ]]; then + if [[ ! -d "$BACKUP_DIR" ]]; then + err "No backup found at outputs.bak/. Nothing to restore." + exit 1 + fi + rm -rf "$OUTPUTS_DIR" + mv "$BACKUP_DIR" "$OUTPUTS_DIR" + ok "Restored outputs/ from backup." + exit 0 +fi + +# ─── Verify fixtures exist ─────────────────────────────────────────────────── +if [[ ! -d "$FIXTURES_DIR/projects" ]]; then + err "Fixtures not found at fixtures/demo/outputs/projects/" + err "Make sure you're running this from the specwright repo root." + exit 1 +fi + +# ─── Handle existing outputs/ ──────────────────────────────────────────────── +if [[ -d "$OUTPUTS_DIR/projects" ]]; then + if [[ "${1:-}" == "--clean" ]]; then + warn "Removing existing outputs/ (--clean mode)" + rm -rf "$OUTPUTS_DIR" + else + if [[ -d "$BACKUP_DIR" ]]; then + warn "Previous backup exists at outputs.bak/ — overwriting it." + rm -rf "$BACKUP_DIR" + fi + info "Backing up existing outputs/ → outputs.bak/" + cp -a "$OUTPUTS_DIR" "$BACKUP_DIR" + rm -rf "$OUTPUTS_DIR/projects" + fi +fi + +# ─── Copy fixtures ─────────────────────────────────────────────────────────── +mkdir -p "$OUTPUTS_DIR" +cp -a "$FIXTURES_DIR/projects" "$OUTPUTS_DIR/projects" + +# ─── Freshen timestamps ───────────────────────────────────────────────────── +# Update createdAt/lastUpdatedAt in project_status.json to be relative to now +# so the projects feel "recent" in the UI. +NOW=$(date -u +"%Y-%m-%dT%H:%M:%S.000Z") + +update_timestamps() { + local status_file="$1" + if command -v python3 &>/dev/null; then + python3 -c " +import json, sys +from datetime import datetime, timedelta, timezone + +with open('$status_file', 'r') as f: + data = json.load(f) + +now = datetime.now(timezone.utc) + +# Parse original created/updated to figure out offset +orig_created = datetime.fromisoformat(data['createdAt'].replace('Z', '+00:00')) +orig_updated = datetime.fromisoformat(data['lastUpdatedAt'].replace('Z', '+00:00')) +duration = orig_updated - orig_created + +# Set new times: created = now - duration, updated = now +new_created = now - duration +data['createdAt'] = new_created.strftime('%Y-%m-%dT%H:%M:%S.000Z') +data['lastUpdatedAt'] = now.strftime('%Y-%m-%dT%H:%M:%S.000Z') + +with open('$status_file', 'w') as f: + json.dump(data, f, indent=2) + f.write('\n') +" + fi +} + +for status_file in "$OUTPUTS_DIR"/projects/*/project_status.json; do + if [[ -f "$status_file" ]]; then + update_timestamps "$status_file" + fi +done + +# ─── Summary ───────────────────────────────────────────────────────────────── +PROJECT_COUNT=$(ls -d "$OUTPUTS_DIR/projects"/*/ 2>/dev/null | wc -l | tr -d ' ') + +echo "" +ok "Loaded ${PROJECT_COUNT} demo projects into outputs/projects/" +echo "" + +for dir in "$OUTPUTS_DIR"/projects/*/; do + project_name=$(basename "$dir") + echo " ${BLUE}•${NC} $project_name" +done + +echo "" +info "Start the server: npm run dev:server" +info "Start the UI: npm run dev:ui" +info "Restore original: ./scripts/load-demo.sh --reset" +echo "" From 210f6e71b36993e2f34fa277e5431941cd9f13f7 Mon Sep 17 00:00:00 2001 From: AmElmo Date: Tue, 24 Feb 2026 17:46:58 +0700 Subject: [PATCH 2/3] docs: add demo fixtures section to CLAUDE.md --- CLAUDE.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/CLAUDE.md b/CLAUDE.md index ff93973..ea6d2dd 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -24,6 +24,20 @@ It has a CLI (Node.js/Express) and a Web UI (React/Vite) that share a file-based - `npm run format` / `npm run format:check` — Prettier - No test framework is configured. Do not create test files. +## Demo Fixtures + +Pre-populated projects for demos live in `fixtures/demo/`. Load them with: + +```bash +./scripts/load-demo.sh # Copy fixtures into outputs/ (backs up existing) +./scripts/load-demo.sh --clean # Wipe outputs/ and load fresh +./scripts/load-demo.sh --reset # Restore original outputs/ from backup +``` + +5 NovaMind AI projects at different workflow stages: complete with issues, docs reviewing, mid-workflow, early (PM questions), and complete with all issues pending. + +When editing fixtures, keep files in `fixtures/demo/outputs/` (tracked in git via `.gitignore` negation). The `outputs/` directory itself remains gitignored. + ## Local CLI Testing (Unpublished Changes) **MANDATORY: After ANY code edit session, ALWAYS run `npm run build && npm link` before finishing.** This ensures `specwright-dev` reflects the latest changes. Verify with `specwright-dev --version`. Never skip this step. From 63f308257f357022a5e6c1c809b51bd55e5004af Mon Sep 17 00:00:00 2001 From: AmElmo Date: Tue, 24 Feb 2026 18:35:17 +0700 Subject: [PATCH 3/3] chore: update package-lock.json --- package-lock.json | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/package-lock.json b/package-lock.json index 8e333a1..642d24b 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "specwright", - "version": "3.4.1", + "version": "3.6.2", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "specwright", - "version": "3.4.1", + "version": "3.6.2", "license": "MIT", "dependencies": { "@linear/sdk": "^68.1.0",