Skip to content
This repository was archived by the owner on Jan 19, 2026. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
fabc211
Added spec, plan, tasks etc. for agentic AI corrector
beveradb Sep 29, 2025
533560e
feat(agentic): setup scaffolding and dependencies per technical-guidance
beveradb Sep 29, 2025
6816908
test(agentic): add failing integration tests T013–T016; mark contract…
beveradb Sep 29, 2025
0fbe6a0
feat(agentic): provider base and LiteLLM bridge; schemas, agent, rout…
beveradb Sep 29, 2025
1e57f6e
feat(api): add FastAPI v1 agentic endpoints (scaffold) and in-memory …
beveradb Sep 29, 2025
1b4626e
feat(agentic): env-flagged routing marker in corrector; fallback 503 …
beveradb Sep 29, 2025
4c015af
feat(cli): add --use-agentic-ai and --ai-model flags; 503 fallback re…
beveradb Sep 29, 2025
61422c1
feat(store): add SQLite-backed feedback/session store; wire into API
beveradb Sep 29, 2025
8cc0ab5
test(integration): convert failing placeholders to minimal scenario c…
beveradb Sep 29, 2025
029e7a3
test(integration): start FastAPI review server in background for API …
beveradb Sep 29, 2025
43cc72c
feat(metrics): add in-memory metrics aggregator and expose via /api/v…
beveradb Sep 29, 2025
d308cb0
feat(reliability): add retries/backoff and circuit breaker to provide…
beveradb Sep 29, 2025
b2be09a
feat(providers): add OpenAI, Anthropic, Google, and Ollama provider w…
beveradb Sep 29, 2025
2c8d719
feat(observability): add optional LangFuse hooks to correction endpoint
beveradb Sep 29, 2025
b68326d
feat(agentic): optional agentic proposal pass before rule-based handlers
beveradb Sep 29, 2025
0fa0341
feat(workflows): add minimal consensus workflow scaffold; mark T030 c…
beveradb Sep 29, 2025
344e5ae
feat(feedback): scaffold feedback workflow, collector, and aggregator
beveradb Sep 29, 2025
62db967
feat(frontend): add AIFeedbackModal, ModelSelector, and MetricsDashbo…
beveradb Sep 29, 2025
a17f28e
docs: add Agentic AI section to README; validate provider env; mark d…
beveradb Sep 29, 2025
727a15a
feat(api): add friendly JSON error handlers; add output compatibility…
beveradb Sep 30, 2025
be4e14e
Merge remote-tracking branch 'origin/main' into 001-agentic-ai-corrector
beveradb Oct 15, 2025
aec4f61
Added LRCLIB as default lyrics provider
beveradb Oct 21, 2025
dfb1616
Major refactor of agentic correction to remove LiteLLM and just use L…
beveradb Oct 21, 2025
1bb5b3c
Improved langfuse integration to use sessions
beveradb Oct 21, 2025
34defdf
Began new approach with agentic classifier, initial progress made to …
beveradb Oct 27, 2025
3be4c11
Updated frontend
beveradb Dec 8, 2025
3252241
Merge remote-tracking branch 'origin/main' into 001-agentic-ai-corrector
beveradb Dec 8, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
216 changes: 216 additions & 0 deletions .cursor/plans/agentic-correction-system-5ddf541b.plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
<!-- 5ddf541b-8e90-4e85-9152-c52f39be9149 f7f98f98-6fab-4b10-9382-4948916b84e2 -->
# Agentic Correction UI Improvements

## Overview

Transform the correction UI to be mobile-first with visual duration indicators, inline correction actions, and category-based metrics specifically designed for agentic AI workflows.

## Core Changes

### 1. Visual Duration Indicators for Words

Transform the Corrected Transcription view to show word durations at a glance.

**File: `lyrics_transcriber/frontend/src/components/TranscriptionView.tsx`**

- Add toggle mode: "Text View" (current) vs "Duration View" (new)
- In Duration View, render each line as a timeline bar similar to TimelineEditor
- Each word rendered as a colored bar with width proportional to duration
- Color coding:
- Normal words: light gray
- Corrected words (agentic): green with original word shown above in small gray text
- Uncorrected gaps: orange/red
- Anchors: blue
- Show time ruler above each line
- Flag abnormally long words (>2 seconds) with warning indicator
- Mobile-optimized: Scrollable horizontally if needed, bars tall enough for touch

**Implementation approach:**

- Create new component `DurationTimelineView.tsx` based on TimelineEditor logic
- Reuse `timeToPosition` calculation from TimelineEditor
- Group words by segment/line
- Show original word above corrected word: `<Box sx={{fontSize: '0.6rem', color: 'text.secondary'}}>{originalWord}</Box>`

### 2. Inline Correction Actions

Add touch-friendly action buttons directly on corrected words.

**File: `lyrics_transcriber/frontend/src/components/shared/components/Word.tsx`**

Current implementation only shows tooltip. Enhance to:

- When a word has a correction, render with action buttons
- Position buttons in a small action bar that appears inline (not on hover, always visible on mobile)
- Actions:
- Undo icon (revert to original)
- Edit icon (open edit modal)
- Checkmark icon (accept/approve)
- On mobile: Buttons always visible, adequate size (44px touch target)
- On desktop: Can show on hover for cleaner look
- Style: Subtle, icon-only buttons in a compact horizontal strip
- Use Material-UI IconButton with small size

**New component: `CorrectedWordWithActions.tsx`**

```tsx
interface CorrectedWordWithActionsProps {
word: string
originalWord: string
correction: CorrectionInfo
onRevert: () => void
onEdit: () => void
onAccept: () => void
isMobile: boolean
}
```

### 3. Transform Correction Handlers to Category Metrics

Replace handler toggles with agentic-specific category breakdown.

**File: `lyrics_transcriber/frontend/src/components/Header.tsx`**

When agentic mode detected (check if AgenticCorrector exists in handlers):

- Replace handler toggles with category breakdown
- Show gap categories from `GapCategory` enum:
- SOUND_ALIKE (5)
- PUNCTUATION_ONLY (2)
- BACKGROUND_VOCALS (1)
- etc.
- Sort by count descending
- Make clickable to filter/highlight those corrections in view
- Add quick filter chips: "Low Confidence" (<60%), "High Confidence" (>80%)
- Show average confidence score for all agentic corrections

**Implementation:**

- Add function to aggregate corrections by `gap_category` field from reason string
- Parse reason field: extract text between `[` and `]` for category
- Create new component `AgenticCorrectionMetrics.tsx`

### 4. Enhanced Correction Detail View

Replace cramped tooltip with rich, touch-friendly correction card.

**New component: `CorrectionDetailCard.tsx`**

Triggered by clicking on a corrected word (not hover):

- Modal or slide-up panel on mobile (bottom sheet style)
- Popover on desktop
- Content:
- Large display of original → corrected
- Category badge with icon
- Confidence meter (progress bar)
- Full reasoning text (multi-line, readable)
- Reference context snippet (if available)
- Action buttons (large, clear labels):
- "Revert to Original"
- "Edit Correction"
- "Mark as Correct"
- "Report Issue" (future: submit to feedback API)
- Swipe to dismiss on mobile
- Escape key to close on desktop

### 5. Update Data Types

**File: `lyrics_transcriber/frontend/src/types.ts`**

Add:

```typescript
export interface CorrectionAction {
type: 'revert' | 'edit' | 'accept' | 'reject'
correctionId: string
wordId: string
}

export interface GapCategoryMetric {
category: string
count: number
avgConfidence: number
}
```

### 6. State Management for Correction Actions

**File: `lyrics_transcriber/frontend/src/components/LyricsAnalyzer.tsx`**

Add handlers:

- `handleRevertCorrection(wordId: string)`: Restore original word
- `handleEditCorrection(wordId: string)`: Open edit modal with original word
- `handleAcceptCorrection(wordId: string)`: Mark as approved (future: track in annotation system)

Implement revert:

- Find correction by word_id or corrected_word_id
- Find segment containing corrected word
- Replace corrected word with original word from correction.original_word
- Update data state
- Add to undo history

### 7. Mobile Responsiveness

**Files: Multiple component files**

Ensure all new components:

- Use Material-UI breakpoints for responsive layout
- Touch targets minimum 44x44px
- No hover-only interactions
- Swipe gestures where appropriate (detail cards)
- Bottom sheet modals on mobile instead of center modals
- Adequate spacing for fat-finger taps
- Test on mobile viewport (375px width minimum)

## Implementation Order

1. Duration visualization (most impactful for catching long words)
2. Category metrics panel (replaces confusing handler toggles)
3. Inline action buttons (enables quick revert/edit)
4. Detail card modal (replaces cramped tooltip)
5. Action handlers and state management (makes buttons functional)
6. Mobile polish and testing

## Files to Modify

- `lyrics_transcriber/frontend/src/components/TranscriptionView.tsx` - Add duration view toggle
- Create `lyrics_transcriber/frontend/src/components/DurationTimelineView.tsx` - New visualization
- Create `lyrics_transcriber/frontend/src/components/CorrectedWordWithActions.tsx` - Inline actions
- `lyrics_transcriber/frontend/src/components/shared/components/Word.tsx` - Integrate actions
- Create `lyrics_transcriber/frontend/src/components/CorrectionDetailCard.tsx` - Rich detail view
- Create `lyrics_transcriber/frontend/src/components/AgenticCorrectionMetrics.tsx` - Category breakdown
- `lyrics_transcriber/frontend/src/components/Header.tsx` - Switch to category metrics when agentic
- `lyrics_transcriber/frontend/src/components/LyricsAnalyzer.tsx` - Add action handlers
- `lyrics_transcriber/frontend/src/types.ts` - Add new type definitions

## Key Design Decisions

- Mobile-first: All interactions work without hover
- Always-visible duration bars catch timing issues immediately
- Original word shown above corrected word for quick comparison
- Category-based metrics more useful than handler toggles for agentic workflow
- Inline actions minimize taps for common tasks (revert, edit)
- Rich detail card for when user needs full context
- Future-proof: Action handlers can integrate with annotation/feedback API later

### To-dos

- [ ] Create gap classification schemas and update CorrectionProposal model
- [ ] Build classification prompt template with few-shot examples from gaps_review.yaml
- [ ] Implement category-specific handler classes for each gap type
- [ ] Update AgenticCorrector to use two-step classification workflow
- [ ] Update LyricsCorrector to pass metadata and handle FLAG actions
- [ ] Define CorrectionAnnotation schema and related types
- [ ] Implement FeedbackStore with JSONL storage
- [ ] Add annotation API endpoints to review server
- [ ] Create CorrectionAnnotationModal component
- [ ] Integrate annotation collection into edit workflow
- [ ] Create annotation analysis script
- [ ] Build few-shot example generator from annotations
- [ ] Update classifier to load dynamic few-shot examples
- [ ] Write comprehensive tests for all new components
- [ ] Document the human feedback loop and improvement process
25 changes: 25 additions & 0 deletions .cursor/rules/specify-rules.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# lyrics_transcriber_local Development Guidelines

Auto-generated from all feature plans. Last updated: 2025-09-28

## Active Technologies
- Python 3.10-3.13 (existing codebase compatibility) + FastAPI (existing review server), LangChain/LangGraph (new agentic framework), LangFuse (observability), Ollama (local models), OpenAI/Anthropic/Google APIs (cloud models) (001-agentic-ai-corrector)

## Project Structure
```
backend/
frontend/
tests/
```

## Commands
cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] pytest [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLOGIES] ruff check .

## Code Style
Python 3.10-3.13 (existing codebase compatibility): Follow standard conventions

## Recent Changes
- 001-agentic-ai-corrector: Added Python 3.10-3.13 (existing codebase compatibility) + FastAPI (existing review server), LangChain/LangGraph (new agentic framework), LangFuse (observability), Ollama (local models), OpenAI/Anthropic/Google APIs (cloud models)

<!-- MANUAL ADDITIONS START -->
<!-- MANUAL ADDITIONS END -->
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# Functional
cache

# Mac
.DS_Store

Expand Down
127 changes: 92 additions & 35 deletions .specify/memory/constitution.md
Original file line number Diff line number Diff line change
@@ -1,50 +1,107 @@
# [PROJECT_NAME] Constitution
<!-- Example: Spec Constitution, TaskFlow Constitution, etc. -->
<!--
Sync Impact Report:
Version change: template → 1.0.0 (initial constitution creation)
Added principles:
- I. Test-Driven Development (NON-NEGOTIABLE)
- II. Code Quality & Maintainability
- III. User Experience Consistency
- IV. Performance & Reliability
- V. Observability & Monitoring
Added sections:
- Performance Standards
- Development Workflow
Templates requiring updates:
- ✅ Updated .specify/templates/plan-template.md (Constitution Check section with detailed gates)
- ✅ Updated constitution version reference in plan template
- ✅ Verified spec-template.md and tasks-template.md are consistent
Follow-up TODOs: None
-->

# Lyrics Transcriber Constitution

## Core Principles

### [PRINCIPLE_1_NAME]
<!-- Example: I. Library-First -->
[PRINCIPLE_1_DESCRIPTION]
<!-- Example: Every feature starts as a standalone library; Libraries must be self-contained, independently testable, documented; Clear purpose required - no organizational-only libraries -->
### I. Test-Driven Development (NON-NEGOTIABLE)
Every feature MUST follow strict TDD methodology: write failing tests first, then implement minimal code to make tests pass, then refactor for quality. All tests MUST be written before any implementation code. Contract tests are required for all API endpoints, integration tests for all user workflows, and unit tests for all complex business logic. Code coverage MUST maintain minimum 90% line coverage for new code, with no decrease in overall project coverage allowed.

**Rationale**: TDD ensures predictable behavior, reduces bugs, enables safe refactoring, and serves as living documentation. The complex audio/video processing pipeline requires rigorous testing to prevent regressions.

### II. Code Quality & Maintainability
All code MUST be self-documenting through clear naming, comprehensive docstrings for public APIs, and adherence to established patterns. Type hints are mandatory for all function signatures and complex data structures. Code MUST pass linting (flake8/black), static type checking (mypy), and security scanning. No code duplication above 15 lines without explicit architectural justification. All public functions MUST include comprehensive docstrings with examples.

**Rationale**: High-quality, maintainable code reduces technical debt, enables team collaboration, and ensures the complex multimedia processing pipeline remains debuggable and extensible.

### III. User Experience Consistency
All user interfaces (CLI, web UI, API responses) MUST provide consistent interaction patterns, error messaging, and feedback mechanisms. CLI commands MUST follow standard Unix conventions with consistent flag naming and help text. Web UI MUST maintain responsive design, accessibility standards (WCAG 2.1 AA), and consistent visual patterns. All error messages MUST be actionable with clear next steps for users.

**Rationale**: Consistent UX reduces user cognitive load, improves adoption, and reduces support burden. The tool serves both technical and non-technical users requiring intuitive interfaces.

### [PRINCIPLE_2_NAME]
<!-- Example: II. CLI Interface -->
[PRINCIPLE_2_DESCRIPTION]
<!-- Example: Every library exposes functionality via CLI; Text in/out protocol: stdin/args → stdout, errors → stderr; Support JSON + human-readable formats -->
### IV. Performance & Reliability
All audio/video processing operations MUST complete within defined performance budgets (see Performance Standards below). Memory usage MUST remain bounded with proper cleanup of large media objects. All external API calls MUST implement proper retry logic with exponential backoff and circuit breaker patterns. System MUST gracefully handle and recover from failures without data loss.

### [PRINCIPLE_3_NAME]
<!-- Example: III. Test-First (NON-NEGOTIABLE) -->
[PRINCIPLE_3_DESCRIPTION]
<!-- Example: TDD mandatory: Tests written → User approved → Tests fail → Then implement; Red-Green-Refactor cycle strictly enforced -->
**Rationale**: Media processing is resource-intensive and time-critical. Users expect reliable, efficient processing of their audio files without system crashes or excessive wait times.

### [PRINCIPLE_4_NAME]
<!-- Example: IV. Integration Testing -->
[PRINCIPLE_4_DESCRIPTION]
<!-- Example: Focus areas requiring integration tests: New library contract tests, Contract changes, Inter-service communication, Shared schemas -->
### V. Observability & Monitoring
All operations MUST emit structured logs with consistent formatting and appropriate log levels. Performance metrics MUST be collected for all critical paths (transcription time, correction accuracy, API response times). All external service interactions MUST be instrumented with tracing. System health checks MUST be implemented for all services and dependencies.

### [PRINCIPLE_5_NAME]
<!-- Example: V. Observability, VI. Versioning & Breaking Changes, VII. Simplicity -->
[PRINCIPLE_5_DESCRIPTION]
<!-- Example: Text I/O ensures debuggability; Structured logging required; Or: MAJOR.MINOR.BUILD format; Or: Start simple, YAGNI principles -->
**Rationale**: Complex AI/ML pipelines require comprehensive observability to diagnose issues, optimize performance, and ensure system reliability in production environments.

## [SECTION_2_NAME]
<!-- Example: Additional Constraints, Security Requirements, Performance Standards, etc. -->
## Performance Standards

[SECTION_2_CONTENT]
<!-- Example: Technology stack requirements, compliance standards, deployment policies, etc. -->
**Processing Time Limits**:
- Audio transcription: <30 seconds per minute of audio (excluding external API wait time)
- Lyrics correction: <10 seconds per song
- Video generation: <2x real-time (e.g., 4 minutes for 2-minute song)
- Web UI response: <200ms for interactive operations, <2 seconds for processing operations

## [SECTION_3_NAME]
<!-- Example: Development Workflow, Review Process, Quality Gates, etc. -->
**Resource Constraints**:
- Memory usage: <4GB peak for processing single audio files up to 10 minutes
- Disk usage: Temporary files MUST be cleaned up within 24 hours
- CPU usage: MUST support concurrent processing of up to 3 songs simultaneously

[SECTION_3_CONTENT]
<!-- Example: Code review requirements, testing gates, deployment approval process, etc. -->
**Reliability Requirements**:
- External API failures MUST NOT crash the application
- Processing MUST resume from checkpoint after interruption for operations >30 seconds
- Data corruption detection and recovery MUST be implemented for all cache operations

## Development Workflow

**Pre-Development Gates**:
- All features MUST have approved specification before development begins
- Technical design MUST be reviewed and approved for features touching core processing pipeline
- Breaking changes MUST have migration plan and backward compatibility period

**Code Review Requirements**:
- All code MUST be reviewed by at least one other developer
- Performance-critical changes MUST include performance test results
- Security-sensitive changes MUST include security review
- UI changes MUST include accessibility review and cross-browser testing

**Quality Gates**:
- All tests MUST pass before merge
- Code coverage MUST NOT decrease from current levels
- Static analysis MUST pass without warnings for new code
- Performance benchmarks MUST NOT regress by >5% without justification

## Governance
<!-- Example: Constitution supersedes all other practices; Amendments require documentation, approval, migration plan -->

[GOVERNANCE_RULES]
<!-- Example: All PRs/reviews must verify compliance; Complexity must be justified; Use [GUIDANCE_FILE] for runtime development guidance -->
**Amendment Process**:
This constitution supersedes all other development practices and coding standards. Amendments require:
1. Written proposal with justification and impact analysis
2. Review by project maintainers
3. Migration plan for existing code if applicable
4. Update of all dependent templates and documentation

**Compliance Review**:
- All pull requests MUST verify compliance with constitutional principles
- Monthly review of adherence to performance standards and quality metrics
- Quarterly review of constitution effectiveness and potential amendments

**Exception Process**:
Temporary exceptions to principles may be granted for critical fixes or urgent features, but MUST:
1. Be explicitly documented with expiration date
2. Include plan for bringing code into compliance
3. Be approved by project maintainer
4. Be tracked until resolved

**Version**: [CONSTITUTION_VERSION] | **Ratified**: [RATIFICATION_DATE] | **Last Amended**: [LAST_AMENDED_DATE]
<!-- Example: Version: 2.1.1 | Ratified: 2025-06-13 | Last Amended: 2025-07-16 -->
**Version**: 1.0.0 | **Ratified**: 2025-09-29 | **Last Amended**: 2025-09-29
Loading
Loading