nomadkaraoke
diff --git a/‎.cursor/plans/agentic-correction-system-5ddf541b.plan.md‎
Lines changed: 216 additions & 0 deletions b/‎.cursor/plans/agentic-correction-system-5ddf541b.plan.md‎
Lines changed: 216 additions & 0 deletions
diff --git a/‎.cursor/rules/specify-rules.mdc‎
Lines changed: 25 additions & 0 deletions b/‎.cursor/rules/specify-rules.mdc‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 3 additions & 0 deletions b/‎.gitignore‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎.specify/memory/constitution.md‎
Lines changed: 92 additions & 35 deletions b/‎.specify/memory/constitution.md‎
Lines changed: 92 additions & 35 deletions
@@ -0,0 +1,216 @@
+<!-- 5ddf541b-8e90-4e85-9152-c52f39be9149 f7f98f98-6fab-4b10-9382-4948916b84e2 -->
+# Agentic Correction UI Improvements
+
+## Overview
+
+Transform the correction UI to be mobile-first with visual duration indicators, inline correction actions, and category-based metrics specifically designed for agentic AI workflows.
+
+## Core Changes
+
+### 1. Visual Duration Indicators for Words
+
+Transform the Corrected Transcription view to show word durations at a glance.
+
+**File: `lyrics_transcriber/frontend/src/components/TranscriptionView.tsx`**
+
+- Add toggle mode: "Text View" (current) vs "Duration View" (new)
+- In Duration View, render each line as a timeline bar similar to TimelineEditor
+- Each word rendered as a colored bar with width proportional to duration
+- Color coding:
+  - Normal words: light gray
+  - Corrected words (agentic): green with original word shown above in small gray text
+  - Uncorrected gaps: orange/red
+  - Anchors: blue
+- Show time ruler above each line
+- Flag abnormally long words (>2 seconds) with warning indicator
+- Mobile-optimized: Scrollable horizontally if needed, bars tall enough for touch
+
+**Implementation approach:**
+
+- Create new component `DurationTimelineView.tsx` based on TimelineEditor logic
+- Reuse `timeToPosition` calculation from TimelineEditor
+- Group words by segment/line
+- Show original word above corrected word: `<Box sx={{fontSize: '0.6rem', color: 'text.secondary'}}>{originalWord}</Box>`
+
+### 2. Inline Correction Actions
+
+Add touch-friendly action buttons directly on corrected words.
+
+**File: `lyrics_transcriber/frontend/src/components/shared/components/Word.tsx`**
+
+Current implementation only shows tooltip. Enhance to:
+
+- When a word has a correction, render with action buttons
+- Position buttons in a small action bar that appears inline (not on hover, always visible on mobile)
+- Actions:
+  - Undo icon (revert to original)
+  - Edit icon (open edit modal)
+  - Checkmark icon (accept/approve)
+- On mobile: Buttons always visible, adequate size (44px touch target)
+- On desktop: Can show on hover for cleaner look
+- Style: Subtle, icon-only buttons in a compact horizontal strip
+- Use Material-UI IconButton with small size
+
+**New component: `CorrectedWordWithActions.tsx`**
+
+```tsx
+interface CorrectedWordWithActionsProps {
+  word: string
+  originalWord: string
+  correction: CorrectionInfo
+  onRevert: () => void
+  onEdit: () => void
+  onAccept: () => void
+  isMobile: boolean
+}
+```
+
+### 3. Transform Correction Handlers to Category Metrics
+
+Replace handler toggles with agentic-specific category breakdown.
+
+**File: `lyrics_transcriber/frontend/src/components/Header.tsx`**
+
+When agentic mode detected (check if AgenticCorrector exists in handlers):
+
+- Replace handler toggles with category breakdown
+- Show gap categories from `GapCategory` enum:
+  - SOUND_ALIKE (5)
+  - PUNCTUATION_ONLY (2)
+  - BACKGROUND_VOCALS (1)
+  - etc.
+- Sort by count descending
+- Make clickable to filter/highlight those corrections in view
+- Add quick filter chips: "Low Confidence" (<60%), "High Confidence" (>80%)
+- Show average confidence score for all agentic corrections
+
+**Implementation:**
+
+- Add function to aggregate corrections by `gap_category` field from reason string
+- Parse reason field: extract text between `[` and `]` for category
+- Create new component `AgenticCorrectionMetrics.tsx`
+
+### 4. Enhanced Correction Detail View
+
+Replace cramped tooltip with rich, touch-friendly correction card.
+
+**New component: `CorrectionDetailCard.tsx`**
+
+Triggered by clicking on a corrected word (not hover):
+
+- Modal or slide-up panel on mobile (bottom sheet style)
+- Popover on desktop
+- Content:
+  - Large display of original → corrected
+  - Category badge with icon
+  - Confidence meter (progress bar)
+  - Full reasoning text (multi-line, readable)
+  - Reference context snippet (if available)
+  - Action buttons (large, clear labels):
+    - "Revert to Original"
+    - "Edit Correction"
+    - "Mark as Correct"
+    - "Report Issue" (future: submit to feedback API)
+- Swipe to dismiss on mobile
+- Escape key to close on desktop
+
+### 5. Update Data Types
+
+**File: `lyrics_transcriber/frontend/src/types.ts`**
+
+Add:
+
+```typescript
+export interface CorrectionAction {
+  type: 'revert' | 'edit' | 'accept' | 'reject'
+  correctionId: string
+  wordId: string
+}
+
+export interface GapCategoryMetric {
+  category: string
+  count: number
+  avgConfidence: number
+}
+```
+
+### 6. State Management for Correction Actions
+
+**File: `lyrics_transcriber/frontend/src/components/LyricsAnalyzer.tsx`**
+
+Add handlers:
+
+- `handleRevertCorrection(wordId: string)`: Restore original word
+- `handleEditCorrection(wordId: string)`: Open edit modal with original word
+- `handleAcceptCorrection(wordId: string)`: Mark as approved (future: track in annotation system)
+
+Implement revert:
+
+- Find correction by word_id or corrected_word_id
+- Find segment containing corrected word
+- Replace corrected word with original word from correction.original_word
+- Update data state
+- Add to undo history
+
+### 7. Mobile Responsiveness
+
+**Files: Multiple component files**
+
+Ensure all new components:
+
+- Use Material-UI breakpoints for responsive layout
+- Touch targets minimum 44x44px
+- No hover-only interactions
+- Swipe gestures where appropriate (detail cards)
+- Bottom sheet modals on mobile instead of center modals
+- Adequate spacing for fat-finger taps
+- Test on mobile viewport (375px width minimum)
+
+## Implementation Order
+
+1. Duration visualization (most impactful for catching long words)
+2. Category metrics panel (replaces confusing handler toggles)
+3. Inline action buttons (enables quick revert/edit)
+4. Detail card modal (replaces cramped tooltip)
+5. Action handlers and state management (makes buttons functional)
+6. Mobile polish and testing
+
+## Files to Modify
+
+- `lyrics_transcriber/frontend/src/components/TranscriptionView.tsx` - Add duration view toggle
+- Create `lyrics_transcriber/frontend/src/components/DurationTimelineView.tsx` - New visualization
+- Create `lyrics_transcriber/frontend/src/components/CorrectedWordWithActions.tsx` - Inline actions
+- `lyrics_transcriber/frontend/src/components/shared/components/Word.tsx` - Integrate actions
+- Create `lyrics_transcriber/frontend/src/components/CorrectionDetailCard.tsx` - Rich detail view
+- Create `lyrics_transcriber/frontend/src/components/AgenticCorrectionMetrics.tsx` - Category breakdown
+- `lyrics_transcriber/frontend/src/components/Header.tsx` - Switch to category metrics when agentic
+- `lyrics_transcriber/frontend/src/components/LyricsAnalyzer.tsx` - Add action handlers
+- `lyrics_transcriber/frontend/src/types.ts` - Add new type definitions
+
+## Key Design Decisions
+
+- Mobile-first: All interactions work without hover
+- Always-visible duration bars catch timing issues immediately
+- Original word shown above corrected word for quick comparison
+- Category-based metrics more useful than handler toggles for agentic workflow
+- Inline actions minimize taps for common tasks (revert, edit)
+- Rich detail card for when user needs full context
+- Future-proof: Action handlers can integrate with annotation/feedback API later
+
+### To-dos
+
+- [ ] Create gap classification schemas and update CorrectionProposal model
+- [ ] Build classification prompt template with few-shot examples from gaps_review.yaml
+- [ ] Implement category-specific handler classes for each gap type
+- [ ] Update AgenticCorrector to use two-step classification workflow
+- [ ] Update LyricsCorrector to pass metadata and handle FLAG actions
+- [ ] Define CorrectionAnnotation schema and related types
+- [ ] Implement FeedbackStore with JSONL storage
+- [ ] Add annotation API endpoints to review server
+- [ ] Create CorrectionAnnotationModal component
+- [ ] Integrate annotation collection into edit workflow
+- [ ] Create annotation analysis script
+- [ ] Build few-shot example generator from annotations
+- [ ] Update classifier to load dynamic few-shot examples
+- [ ] Write comprehensive tests for all new components
+- [ ] Document the human feedback loop and improvement process
@@ -0,0 +1,25 @@
+# lyrics_transcriber_local Development Guidelines
+
+Auto-generated from all feature plans. Last updated: 2025-09-28
+
+## Active Technologies
+- Python 3.10-3.13 (existing codebase compatibility) + FastAPI (existing review server), LangChain/LangGraph (new agentic framework), LangFuse (observability), Ollama (local models), OpenAI/Anthropic/Google APIs (cloud models) (001-agentic-ai-corrector)
+
+## Project Structure
+```
+backend/
+frontend/
+tests/
+```
+
+## Commands
+cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check .
+
+## Code Style
+Python 3.10-3.13 (existing codebase compatibility): Follow standard conventions
+
+## Recent Changes
+- 001-agentic-ai-corrector: Added Python 3.10-3.13 (existing codebase compatibility) + FastAPI (existing review server), LangChain/LangGraph (new agentic framework), LangFuse (observability), Ollama (local models), OpenAI/Anthropic/Google APIs (cloud models)
+
+<!-- MANUAL ADDITIONS START -->
+<!-- MANUAL ADDITIONS END -->
@@ -1,3 +1,6 @@
+# Functional
+cache 
+
 # Mac
 .DS_Store
 
 
@@ -1,50 +1,107 @@
-# [PROJECT_NAME] Constitution
-<!-- Example: Spec Constitution, TaskFlow Constitution, etc. -->
+<!--
+Sync Impact Report:
+Version change: template → 1.0.0 (initial constitution creation)
+Added principles:
+- I. Test-Driven Development (NON-NEGOTIABLE)
+- II. Code Quality & Maintainability 
+- III. User Experience Consistency
+- IV. Performance & Reliability
+- V. Observability & Monitoring
+Added sections:
+- Performance Standards
+- Development Workflow
+Templates requiring updates:
+- ✅ Updated .specify/templates/plan-template.md (Constitution Check section with detailed gates)
+- ✅ Updated constitution version reference in plan template
+- ✅ Verified spec-template.md and tasks-template.md are consistent
+Follow-up TODOs: None
+-->
+
+# Lyrics Transcriber Constitution
 
 ## Core Principles
 
-### [PRINCIPLE_1_NAME]
-<!-- Example: I. Library-First -->
-[PRINCIPLE_1_DESCRIPTION]
-<!-- Example: Every feature starts as a standalone library; Libraries must be self-contained, independently testable, documented; Clear purpose required - no organizational-only libraries -->
+### I. Test-Driven Development (NON-NEGOTIABLE)
+Every feature MUST follow strict TDD methodology: write failing tests first, then implement minimal code to make tests pass, then refactor for quality. All tests MUST be written before any implementation code. Contract tests are required for all API endpoints, integration tests for all user workflows, and unit tests for all complex business logic. Code coverage MUST maintain minimum 90% line coverage for new code, with no decrease in overall project coverage allowed.
+
+**Rationale**: TDD ensures predictable behavior, reduces bugs, enables safe refactoring, and serves as living documentation. The complex audio/video processing pipeline requires rigorous testing to prevent regressions.
+
+### II. Code Quality & Maintainability
+All code MUST be self-documenting through clear naming, comprehensive docstrings for public APIs, and adherence to established patterns. Type hints are mandatory for all function signatures and complex data structures. Code MUST pass linting (flake8/black), static type checking (mypy), and security scanning. No code duplication above 15 lines without explicit architectural justification. All public functions MUST include comprehensive docstrings with examples.
+
+**Rationale**: High-quality, maintainable code reduces technical debt, enables team collaboration, and ensures the complex multimedia processing pipeline remains debuggable and extensible.
+
+### III. User Experience Consistency
+All user interfaces (CLI, web UI, API responses) MUST provide consistent interaction patterns, error messaging, and feedback mechanisms. CLI commands MUST follow standard Unix conventions with consistent flag naming and help text. Web UI MUST maintain responsive design, accessibility standards (WCAG 2.1 AA), and consistent visual patterns. All error messages MUST be actionable with clear next steps for users.
+
+**Rationale**: Consistent UX reduces user cognitive load, improves adoption, and reduces support burden. The tool serves both technical and non-technical users requiring intuitive interfaces.
 
-### [PRINCIPLE_2_NAME]
-<!-- Example: II. CLI Interface -->
-[PRINCIPLE_2_DESCRIPTION]
-<!-- Example: Every library exposes functionality via CLI; Text in/out protocol: stdin/args → stdout, errors → stderr; Support JSON + human-readable formats -->
+### IV. Performance & Reliability
+All audio/video processing operations MUST complete within defined performance budgets (see Performance Standards below). Memory usage MUST remain bounded with proper cleanup of large media objects. All external API calls MUST implement proper retry logic with exponential backoff and circuit breaker patterns. System MUST gracefully handle and recover from failures without data loss.
 
-### [PRINCIPLE_3_NAME]
-<!-- Example: III. Test-First (NON-NEGOTIABLE) -->
-[PRINCIPLE_3_DESCRIPTION]
-<!-- Example: TDD mandatory: Tests written → User approved → Tests fail → Then implement; Red-Green-Refactor cycle strictly enforced -->
+**Rationale**: Media processing is resource-intensive and time-critical. Users expect reliable, efficient processing of their audio files without system crashes or excessive wait times.
 
-### [PRINCIPLE_4_NAME]
-<!-- Example: IV. Integration Testing -->
-[PRINCIPLE_4_DESCRIPTION]
-<!-- Example: Focus areas requiring integration tests: New library contract tests, Contract changes, Inter-service communication, Shared schemas -->
+### V. Observability & Monitoring
+All operations MUST emit structured logs with consistent formatting and appropriate log levels. Performance metrics MUST be collected for all critical paths (transcription time, correction accuracy, API response times). All external service interactions MUST be instrumented with tracing. System health checks MUST be implemented for all services and dependencies.
 
-### [PRINCIPLE_5_NAME]
-<!-- Example: V. Observability, VI. Versioning & Breaking Changes, VII. Simplicity -->
-[PRINCIPLE_5_DESCRIPTION]
-<!-- Example: Text I/O ensures debuggability; Structured logging required; Or: MAJOR.MINOR.BUILD format; Or: Start simple, YAGNI principles -->
+**Rationale**: Complex AI/ML pipelines require comprehensive observability to diagnose issues, optimize performance, and ensure system reliability in production environments.
 
-## [SECTION_2_NAME]
-<!-- Example: Additional Constraints, Security Requirements, Performance Standards, etc. -->
+## Performance Standards
 
-[SECTION_2_CONTENT]
-<!-- Example: Technology stack requirements, compliance standards, deployment policies, etc. -->
+**Processing Time Limits**:
+- Audio transcription: <30 seconds per minute of audio (excluding external API wait time)  
+- Lyrics correction: <10 seconds per song
+- Video generation: <2x real-time (e.g., 4 minutes for 2-minute song)
+- Web UI response: <200ms for interactive operations, <2 seconds for processing operations
 
-## [SECTION_3_NAME]
-<!-- Example: Development Workflow, Review Process, Quality Gates, etc. -->
+**Resource Constraints**:
+- Memory usage: <4GB peak for processing single audio files up to 10 minutes
+- Disk usage: Temporary files MUST be cleaned up within 24 hours
+- CPU usage: MUST support concurrent processing of up to 3 songs simultaneously
 
-[SECTION_3_CONTENT]
-<!-- Example: Code review requirements, testing gates, deployment approval process, etc. -->
+**Reliability Requirements**:
+- External API failures MUST NOT crash the application
+- Processing MUST resume from checkpoint after interruption for operations >30 seconds
+- Data corruption detection and recovery MUST be implemented for all cache operations
+
+## Development Workflow
+
+**Pre-Development Gates**:
+- All features MUST have approved specification before development begins
+- Technical design MUST be reviewed and approved for features touching core processing pipeline
+- Breaking changes MUST have migration plan and backward compatibility period
+
+**Code Review Requirements**:
+- All code MUST be reviewed by at least one other developer
+- Performance-critical changes MUST include performance test results
+- Security-sensitive changes MUST include security review
+- UI changes MUST include accessibility review and cross-browser testing
+
+**Quality Gates**:
+- All tests MUST pass before merge
+- Code coverage MUST NOT decrease from current levels
+- Static analysis MUST pass without warnings for new code
+- Performance benchmarks MUST NOT regress by >5% without justification
 
 ## Governance
-<!-- Example: Constitution supersedes all other practices; Amendments require documentation, approval, migration plan -->
 
-[GOVERNANCE_RULES]
-<!-- Example: All PRs/reviews must verify compliance; Complexity must be justified; Use [GUIDANCE_FILE] for runtime development guidance -->
+**Amendment Process**:
+This constitution supersedes all other development practices and coding standards. Amendments require:
+1. Written proposal with justification and impact analysis
+2. Review by project maintainers
+3. Migration plan for existing code if applicable
+4. Update of all dependent templates and documentation
+
+**Compliance Review**:
+- All pull requests MUST verify compliance with constitutional principles
+- Monthly review of adherence to performance standards and quality metrics
+- Quarterly review of constitution effectiveness and potential amendments
+
+**Exception Process**:
+Temporary exceptions to principles may be granted for critical fixes or urgent features, but MUST:
+1. Be explicitly documented with expiration date
+2. Include plan for bringing code into compliance
+3. Be approved by project maintainer
+4. Be tracked until resolved
 
-**Version**: [CONSTITUTION_VERSION] | **Ratified**: [RATIFICATION_DATE] | **Last Amended**: [LAST_AMENDED_DATE]
-<!-- Example: Version: 2.1.1 | Ratified: 2025-06-13 | Last Amended: 2025-07-16 -->
+**Version**: 1.0.0 | **Ratified**: 2025-09-29 | **Last Amended**: 2025-09-29
-Original file line number
+Diff line change
@@ @@ -1,3 +1,6 @@ @@
 +# Functional
 +cache
++
 # Mac
 .DS_Store