Skip to content
This repository was archived by the owner on Jan 19, 2026. It is now read-only.

Commit e54d9c0

Browse files
authored
Merge pull request #19 from nomadkaraoke/001-agentic-ai-corrector
WIP Agentic Corrector
2 parents 15192fd + 3252241 commit e54d9c0

148 files changed

Lines changed: 25973 additions & 2037 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
<!-- 5ddf541b-8e90-4e85-9152-c52f39be9149 f7f98f98-6fab-4b10-9382-4948916b84e2 -->
2+
# Agentic Correction UI Improvements
3+
4+
## Overview
5+
6+
Transform the correction UI to be mobile-first with visual duration indicators, inline correction actions, and category-based metrics specifically designed for agentic AI workflows.
7+
8+
## Core Changes
9+
10+
### 1. Visual Duration Indicators for Words
11+
12+
Transform the Corrected Transcription view to show word durations at a glance.
13+
14+
**File: `lyrics_transcriber/frontend/src/components/TranscriptionView.tsx`**
15+
16+
- Add toggle mode: "Text View" (current) vs "Duration View" (new)
17+
- In Duration View, render each line as a timeline bar similar to TimelineEditor
18+
- Each word rendered as a colored bar with width proportional to duration
19+
- Color coding:
20+
- Normal words: light gray
21+
- Corrected words (agentic): green with original word shown above in small gray text
22+
- Uncorrected gaps: orange/red
23+
- Anchors: blue
24+
- Show time ruler above each line
25+
- Flag abnormally long words (>2 seconds) with warning indicator
26+
- Mobile-optimized: Scrollable horizontally if needed, bars tall enough for touch
27+
28+
**Implementation approach:**
29+
30+
- Create new component `DurationTimelineView.tsx` based on TimelineEditor logic
31+
- Reuse `timeToPosition` calculation from TimelineEditor
32+
- Group words by segment/line
33+
- Show original word above corrected word: `<Box sx={{fontSize: '0.6rem', color: 'text.secondary'}}>{originalWord}</Box>`
34+
35+
### 2. Inline Correction Actions
36+
37+
Add touch-friendly action buttons directly on corrected words.
38+
39+
**File: `lyrics_transcriber/frontend/src/components/shared/components/Word.tsx`**
40+
41+
Current implementation only shows tooltip. Enhance to:
42+
43+
- When a word has a correction, render with action buttons
44+
- Position buttons in a small action bar that appears inline (not on hover, always visible on mobile)
45+
- Actions:
46+
- Undo icon (revert to original)
47+
- Edit icon (open edit modal)
48+
- Checkmark icon (accept/approve)
49+
- On mobile: Buttons always visible, adequate size (44px touch target)
50+
- On desktop: Can show on hover for cleaner look
51+
- Style: Subtle, icon-only buttons in a compact horizontal strip
52+
- Use Material-UI IconButton with small size
53+
54+
**New component: `CorrectedWordWithActions.tsx`**
55+
56+
```tsx
57+
interface CorrectedWordWithActionsProps {
58+
word: string
59+
originalWord: string
60+
correction: CorrectionInfo
61+
onRevert: () => void
62+
onEdit: () => void
63+
onAccept: () => void
64+
isMobile: boolean
65+
}
66+
```
67+
68+
### 3. Transform Correction Handlers to Category Metrics
69+
70+
Replace handler toggles with agentic-specific category breakdown.
71+
72+
**File: `lyrics_transcriber/frontend/src/components/Header.tsx`**
73+
74+
When agentic mode detected (check if AgenticCorrector exists in handlers):
75+
76+
- Replace handler toggles with category breakdown
77+
- Show gap categories from `GapCategory` enum:
78+
- SOUND_ALIKE (5)
79+
- PUNCTUATION_ONLY (2)
80+
- BACKGROUND_VOCALS (1)
81+
- etc.
82+
- Sort by count descending
83+
- Make clickable to filter/highlight those corrections in view
84+
- Add quick filter chips: "Low Confidence" (<60%), "High Confidence" (>80%)
85+
- Show average confidence score for all agentic corrections
86+
87+
**Implementation:**
88+
89+
- Add function to aggregate corrections by `gap_category` field from reason string
90+
- Parse reason field: extract text between `[` and `]` for category
91+
- Create new component `AgenticCorrectionMetrics.tsx`
92+
93+
### 4. Enhanced Correction Detail View
94+
95+
Replace cramped tooltip with rich, touch-friendly correction card.
96+
97+
**New component: `CorrectionDetailCard.tsx`**
98+
99+
Triggered by clicking on a corrected word (not hover):
100+
101+
- Modal or slide-up panel on mobile (bottom sheet style)
102+
- Popover on desktop
103+
- Content:
104+
- Large display of original → corrected
105+
- Category badge with icon
106+
- Confidence meter (progress bar)
107+
- Full reasoning text (multi-line, readable)
108+
- Reference context snippet (if available)
109+
- Action buttons (large, clear labels):
110+
- "Revert to Original"
111+
- "Edit Correction"
112+
- "Mark as Correct"
113+
- "Report Issue" (future: submit to feedback API)
114+
- Swipe to dismiss on mobile
115+
- Escape key to close on desktop
116+
117+
### 5. Update Data Types
118+
119+
**File: `lyrics_transcriber/frontend/src/types.ts`**
120+
121+
Add:
122+
123+
```typescript
124+
export interface CorrectionAction {
125+
type: 'revert' | 'edit' | 'accept' | 'reject'
126+
correctionId: string
127+
wordId: string
128+
}
129+
130+
export interface GapCategoryMetric {
131+
category: string
132+
count: number
133+
avgConfidence: number
134+
}
135+
```
136+
137+
### 6. State Management for Correction Actions
138+
139+
**File: `lyrics_transcriber/frontend/src/components/LyricsAnalyzer.tsx`**
140+
141+
Add handlers:
142+
143+
- `handleRevertCorrection(wordId: string)`: Restore original word
144+
- `handleEditCorrection(wordId: string)`: Open edit modal with original word
145+
- `handleAcceptCorrection(wordId: string)`: Mark as approved (future: track in annotation system)
146+
147+
Implement revert:
148+
149+
- Find correction by word_id or corrected_word_id
150+
- Find segment containing corrected word
151+
- Replace corrected word with original word from correction.original_word
152+
- Update data state
153+
- Add to undo history
154+
155+
### 7. Mobile Responsiveness
156+
157+
**Files: Multiple component files**
158+
159+
Ensure all new components:
160+
161+
- Use Material-UI breakpoints for responsive layout
162+
- Touch targets minimum 44x44px
163+
- No hover-only interactions
164+
- Swipe gestures where appropriate (detail cards)
165+
- Bottom sheet modals on mobile instead of center modals
166+
- Adequate spacing for fat-finger taps
167+
- Test on mobile viewport (375px width minimum)
168+
169+
## Implementation Order
170+
171+
1. Duration visualization (most impactful for catching long words)
172+
2. Category metrics panel (replaces confusing handler toggles)
173+
3. Inline action buttons (enables quick revert/edit)
174+
4. Detail card modal (replaces cramped tooltip)
175+
5. Action handlers and state management (makes buttons functional)
176+
6. Mobile polish and testing
177+
178+
## Files to Modify
179+
180+
- `lyrics_transcriber/frontend/src/components/TranscriptionView.tsx` - Add duration view toggle
181+
- Create `lyrics_transcriber/frontend/src/components/DurationTimelineView.tsx` - New visualization
182+
- Create `lyrics_transcriber/frontend/src/components/CorrectedWordWithActions.tsx` - Inline actions
183+
- `lyrics_transcriber/frontend/src/components/shared/components/Word.tsx` - Integrate actions
184+
- Create `lyrics_transcriber/frontend/src/components/CorrectionDetailCard.tsx` - Rich detail view
185+
- Create `lyrics_transcriber/frontend/src/components/AgenticCorrectionMetrics.tsx` - Category breakdown
186+
- `lyrics_transcriber/frontend/src/components/Header.tsx` - Switch to category metrics when agentic
187+
- `lyrics_transcriber/frontend/src/components/LyricsAnalyzer.tsx` - Add action handlers
188+
- `lyrics_transcriber/frontend/src/types.ts` - Add new type definitions
189+
190+
## Key Design Decisions
191+
192+
- Mobile-first: All interactions work without hover
193+
- Always-visible duration bars catch timing issues immediately
194+
- Original word shown above corrected word for quick comparison
195+
- Category-based metrics more useful than handler toggles for agentic workflow
196+
- Inline actions minimize taps for common tasks (revert, edit)
197+
- Rich detail card for when user needs full context
198+
- Future-proof: Action handlers can integrate with annotation/feedback API later
199+
200+
### To-dos
201+
202+
- [ ] Create gap classification schemas and update CorrectionProposal model
203+
- [ ] Build classification prompt template with few-shot examples from gaps_review.yaml
204+
- [ ] Implement category-specific handler classes for each gap type
205+
- [ ] Update AgenticCorrector to use two-step classification workflow
206+
- [ ] Update LyricsCorrector to pass metadata and handle FLAG actions
207+
- [ ] Define CorrectionAnnotation schema and related types
208+
- [ ] Implement FeedbackStore with JSONL storage
209+
- [ ] Add annotation API endpoints to review server
210+
- [ ] Create CorrectionAnnotationModal component
211+
- [ ] Integrate annotation collection into edit workflow
212+
- [ ] Create annotation analysis script
213+
- [ ] Build few-shot example generator from annotations
214+
- [ ] Update classifier to load dynamic few-shot examples
215+
- [ ] Write comprehensive tests for all new components
216+
- [ ] Document the human feedback loop and improvement process

.cursor/rules/specify-rules.mdc

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# lyrics_transcriber_local Development Guidelines
2+
3+
Auto-generated from all feature plans. Last updated: 2025-09-28
4+
5+
## Active Technologies
6+
- Python 3.10-3.13 (existing codebase compatibility) + FastAPI (existing review server), LangChain/LangGraph (new agentic framework), LangFuse (observability), Ollama (local models), OpenAI/Anthropic/Google APIs (cloud models) (001-agentic-ai-corrector)
7+
8+
## Project Structure
9+
```
10+
backend/
11+
frontend/
12+
tests/
13+
```
14+
15+
## Commands
16+
cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check .
17+
18+
## Code Style
19+
Python 3.10-3.13 (existing codebase compatibility): Follow standard conventions
20+
21+
## Recent Changes
22+
- 001-agentic-ai-corrector: Added Python 3.10-3.13 (existing codebase compatibility) + FastAPI (existing review server), LangChain/LangGraph (new agentic framework), LangFuse (observability), Ollama (local models), OpenAI/Anthropic/Google APIs (cloud models)
23+
24+
<!-- MANUAL ADDITIONS START -->
25+
<!-- MANUAL ADDITIONS END -->

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
# Functional
2+
cache
3+
14
# Mac
25
.DS_Store
36

.specify/memory/constitution.md

Lines changed: 92 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,107 @@
1-
# [PROJECT_NAME] Constitution
2-
<!-- Example: Spec Constitution, TaskFlow Constitution, etc. -->
1+
<!--
2+
Sync Impact Report:
3+
Version change: template → 1.0.0 (initial constitution creation)
4+
Added principles:
5+
- I. Test-Driven Development (NON-NEGOTIABLE)
6+
- II. Code Quality & Maintainability
7+
- III. User Experience Consistency
8+
- IV. Performance & Reliability
9+
- V. Observability & Monitoring
10+
Added sections:
11+
- Performance Standards
12+
- Development Workflow
13+
Templates requiring updates:
14+
- ✅ Updated .specify/templates/plan-template.md (Constitution Check section with detailed gates)
15+
- ✅ Updated constitution version reference in plan template
16+
- ✅ Verified spec-template.md and tasks-template.md are consistent
17+
Follow-up TODOs: None
18+
-->
19+
20+
# Lyrics Transcriber Constitution
321

422
## Core Principles
523

6-
### [PRINCIPLE_1_NAME]
7-
<!-- Example: I. Library-First -->
8-
[PRINCIPLE_1_DESCRIPTION]
9-
<!-- Example: Every feature starts as a standalone library; Libraries must be self-contained, independently testable, documented; Clear purpose required - no organizational-only libraries -->
24+
### I. Test-Driven Development (NON-NEGOTIABLE)
25+
Every feature MUST follow strict TDD methodology: write failing tests first, then implement minimal code to make tests pass, then refactor for quality. All tests MUST be written before any implementation code. Contract tests are required for all API endpoints, integration tests for all user workflows, and unit tests for all complex business logic. Code coverage MUST maintain minimum 90% line coverage for new code, with no decrease in overall project coverage allowed.
26+
27+
**Rationale**: TDD ensures predictable behavior, reduces bugs, enables safe refactoring, and serves as living documentation. The complex audio/video processing pipeline requires rigorous testing to prevent regressions.
28+
29+
### II. Code Quality & Maintainability
30+
All code MUST be self-documenting through clear naming, comprehensive docstrings for public APIs, and adherence to established patterns. Type hints are mandatory for all function signatures and complex data structures. Code MUST pass linting (flake8/black), static type checking (mypy), and security scanning. No code duplication above 15 lines without explicit architectural justification. All public functions MUST include comprehensive docstrings with examples.
31+
32+
**Rationale**: High-quality, maintainable code reduces technical debt, enables team collaboration, and ensures the complex multimedia processing pipeline remains debuggable and extensible.
33+
34+
### III. User Experience Consistency
35+
All user interfaces (CLI, web UI, API responses) MUST provide consistent interaction patterns, error messaging, and feedback mechanisms. CLI commands MUST follow standard Unix conventions with consistent flag naming and help text. Web UI MUST maintain responsive design, accessibility standards (WCAG 2.1 AA), and consistent visual patterns. All error messages MUST be actionable with clear next steps for users.
36+
37+
**Rationale**: Consistent UX reduces user cognitive load, improves adoption, and reduces support burden. The tool serves both technical and non-technical users requiring intuitive interfaces.
1038

11-
### [PRINCIPLE_2_NAME]
12-
<!-- Example: II. CLI Interface -->
13-
[PRINCIPLE_2_DESCRIPTION]
14-
<!-- Example: Every library exposes functionality via CLI; Text in/out protocol: stdin/args → stdout, errors → stderr; Support JSON + human-readable formats -->
39+
### IV. Performance & Reliability
40+
All audio/video processing operations MUST complete within defined performance budgets (see Performance Standards below). Memory usage MUST remain bounded with proper cleanup of large media objects. All external API calls MUST implement proper retry logic with exponential backoff and circuit breaker patterns. System MUST gracefully handle and recover from failures without data loss.
1541

16-
### [PRINCIPLE_3_NAME]
17-
<!-- Example: III. Test-First (NON-NEGOTIABLE) -->
18-
[PRINCIPLE_3_DESCRIPTION]
19-
<!-- Example: TDD mandatory: Tests written → User approved → Tests fail → Then implement; Red-Green-Refactor cycle strictly enforced -->
42+
**Rationale**: Media processing is resource-intensive and time-critical. Users expect reliable, efficient processing of their audio files without system crashes or excessive wait times.
2043

21-
### [PRINCIPLE_4_NAME]
22-
<!-- Example: IV. Integration Testing -->
23-
[PRINCIPLE_4_DESCRIPTION]
24-
<!-- Example: Focus areas requiring integration tests: New library contract tests, Contract changes, Inter-service communication, Shared schemas -->
44+
### V. Observability & Monitoring
45+
All operations MUST emit structured logs with consistent formatting and appropriate log levels. Performance metrics MUST be collected for all critical paths (transcription time, correction accuracy, API response times). All external service interactions MUST be instrumented with tracing. System health checks MUST be implemented for all services and dependencies.
2546

26-
### [PRINCIPLE_5_NAME]
27-
<!-- Example: V. Observability, VI. Versioning & Breaking Changes, VII. Simplicity -->
28-
[PRINCIPLE_5_DESCRIPTION]
29-
<!-- Example: Text I/O ensures debuggability; Structured logging required; Or: MAJOR.MINOR.BUILD format; Or: Start simple, YAGNI principles -->
47+
**Rationale**: Complex AI/ML pipelines require comprehensive observability to diagnose issues, optimize performance, and ensure system reliability in production environments.
3048

31-
## [SECTION_2_NAME]
32-
<!-- Example: Additional Constraints, Security Requirements, Performance Standards, etc. -->
49+
## Performance Standards
3350

34-
[SECTION_2_CONTENT]
35-
<!-- Example: Technology stack requirements, compliance standards, deployment policies, etc. -->
51+
**Processing Time Limits**:
52+
- Audio transcription: <30 seconds per minute of audio (excluding external API wait time)
53+
- Lyrics correction: <10 seconds per song
54+
- Video generation: <2x real-time (e.g., 4 minutes for 2-minute song)
55+
- Web UI response: <200ms for interactive operations, <2 seconds for processing operations
3656

37-
## [SECTION_3_NAME]
38-
<!-- Example: Development Workflow, Review Process, Quality Gates, etc. -->
57+
**Resource Constraints**:
58+
- Memory usage: <4GB peak for processing single audio files up to 10 minutes
59+
- Disk usage: Temporary files MUST be cleaned up within 24 hours
60+
- CPU usage: MUST support concurrent processing of up to 3 songs simultaneously
3961

40-
[SECTION_3_CONTENT]
41-
<!-- Example: Code review requirements, testing gates, deployment approval process, etc. -->
62+
**Reliability Requirements**:
63+
- External API failures MUST NOT crash the application
64+
- Processing MUST resume from checkpoint after interruption for operations >30 seconds
65+
- Data corruption detection and recovery MUST be implemented for all cache operations
66+
67+
## Development Workflow
68+
69+
**Pre-Development Gates**:
70+
- All features MUST have approved specification before development begins
71+
- Technical design MUST be reviewed and approved for features touching core processing pipeline
72+
- Breaking changes MUST have migration plan and backward compatibility period
73+
74+
**Code Review Requirements**:
75+
- All code MUST be reviewed by at least one other developer
76+
- Performance-critical changes MUST include performance test results
77+
- Security-sensitive changes MUST include security review
78+
- UI changes MUST include accessibility review and cross-browser testing
79+
80+
**Quality Gates**:
81+
- All tests MUST pass before merge
82+
- Code coverage MUST NOT decrease from current levels
83+
- Static analysis MUST pass without warnings for new code
84+
- Performance benchmarks MUST NOT regress by >5% without justification
4285

4386
## Governance
44-
<!-- Example: Constitution supersedes all other practices; Amendments require documentation, approval, migration plan -->
4587

46-
[GOVERNANCE_RULES]
47-
<!-- Example: All PRs/reviews must verify compliance; Complexity must be justified; Use [GUIDANCE_FILE] for runtime development guidance -->
88+
**Amendment Process**:
89+
This constitution supersedes all other development practices and coding standards. Amendments require:
90+
1. Written proposal with justification and impact analysis
91+
2. Review by project maintainers
92+
3. Migration plan for existing code if applicable
93+
4. Update of all dependent templates and documentation
94+
95+
**Compliance Review**:
96+
- All pull requests MUST verify compliance with constitutional principles
97+
- Monthly review of adherence to performance standards and quality metrics
98+
- Quarterly review of constitution effectiveness and potential amendments
99+
100+
**Exception Process**:
101+
Temporary exceptions to principles may be granted for critical fixes or urgent features, but MUST:
102+
1. Be explicitly documented with expiration date
103+
2. Include plan for bringing code into compliance
104+
3. Be approved by project maintainer
105+
4. Be tracked until resolved
48106

49-
**Version**: [CONSTITUTION_VERSION] | **Ratified**: [RATIFICATION_DATE] | **Last Amended**: [LAST_AMENDED_DATE]
50-
<!-- Example: Version: 2.1.1 | Ratified: 2025-06-13 | Last Amended: 2025-07-16 -->
107+
**Version**: 1.0.0 | **Ratified**: 2025-09-29 | **Last Amended**: 2025-09-29

0 commit comments

Comments
 (0)