A web-based manga translation QA tool that analyzes translated manga text for consistency errors, tone mismatches, untranslated text, and character voice inconsistencies using LLMs and embeddings.
Tech Stack:
- Frontend: React (Vite) → Deployed on Vercel
- Backend: FastAPI (Python) → Deployed on Render
- Database: Supabase (PostgreSQL + pgvector)
- AI: OpenRouter API (LLM + embeddings)
Architecture:
Vercel (React) → Render (FastAPI) → Supabase (Postgres + pgvector)
→ OpenRouter (LLM + embeddings)
Goal: Repo structure, dev environment, database schema, basic API skeleton.
- Initialize git repo
- Create monorepo structure:
/frontend(React) +/backend(FastAPI) - Add
.gitignorefor Python + Node - Add
README.mdwith project description
- Set up Python virtual environment
- Install core dependencies:
fastapi,uvicorn,sqlalchemy,asyncpg,alembic,httpx,pydantic - Create FastAPI app with health check endpoint
- Set up project config / environment variables (
.env) - Set up Alembic for database migrations
- Create Supabase project
- Enable pgvector extension
- Design and create schema:
projects— manga series/project metadatachapters— chapter metadata per projectdialogue_lines— individual lines (speaker, text, type, page, panel)embeddings— vector embeddings linked to dialogue linesanalysis_jobs— job queue tracking (status, timestamps)qa_results— QA findings (checker type, severity, details)
- Run initial migration
- Initialize React app with Vite + TypeScript
- Install dependencies:
react-router-dom,axios,tailwindcss - Set up routing skeleton (Dashboard, Project, Upload, Report pages)
- Set up API client utility
Goal: Users can create projects, upload structured manga JSON, and persist it to the database.
-
POST /api/projects— create a project (title, description, language pair) -
GET /api/projects— list all projects -
GET /api/projects/{id}— get project details -
DELETE /api/projects/{id}— delete a project and its data -
POST /api/projects/{id}/chapters— upload chapter data (JSON) -
GET /api/projects/{id}/chapters— list chapters for a project
- Define Pydantic models for upload schema:
{ "chapter_number": 1, "title": "The Beginning", "pages": [ { "page_number": 1, "panels": [ { "panel_id": 1, "speaker": "Takeshi", "text": "Let's go!", "type": "dialogue | sfx | narration | sign" } ] } ] } - Validate and reject malformed uploads with clear error messages
- Store parsed data into
chapters+dialogue_linestables
- Dashboard page: list projects, create new project button
- Create project form (title, description)
- Project view: list chapters, upload button
- Upload page: JSON file drop zone + preview before submit
- Display upload success/error feedback
Goal: Generate and store vector embeddings for all dialogue lines via OpenRouter API.
- Set up OpenRouter API client (httpx)
- Identify free embedding model on OpenRouter
- Implement embedding generation function (text → vector)
- Handle rate limiting and retries
- On chapter upload, trigger embedding generation for all dialogue lines
- Store embeddings in
embeddingstable (pgvector column) - Build similarity search query (cosine distance via pgvector)
- Test: upload a chapter → verify embeddings stored → run a similarity query
Goal: Implement all four QA checkers that analyze the translated text.
- Detect Japanese characters (hiragana, katakana, kanji) in translated text via Unicode ranges
- Flag dialogue lines containing untranslated text
- Assign severity:
criticalif in dialogue,warningif in SFX/signs - Store results in
qa_results
- Extract key terms: character names, place names, technique names per project
- Use pgvector similarity search to find near-duplicate terms across chapters
- Flag inconsistencies (e.g., "Ryuji" in ch.1 vs "Ryuuji" in ch.5)
- Send flagged pairs to LLM for confirmation: "Are these the same entity translated inconsistently?"
- Store confirmed inconsistencies in
qa_results
- Group all dialogue lines by speaker
- Compute centroid embedding per character (average of all their lines)
- Find outlier lines that deviate significantly from the character's centroid
- Send outliers to LLM with character context: "Does this line match this character's established voice?"
- Store flagged voice breaks in
qa_results
- Group dialogue by scene/page context
- Send scene text to LLM with prompt: "Analyze the tone of this manga dialogue. Flag any lines where the tone feels inconsistent with the scene context."
- Parse LLM response into structured findings
- Store tone mismatches in
qa_results
Goal: Background job system so analysis doesn't block the API.
- Create
POST /api/projects/{id}/analyze— triggers full QA analysis - Create job record in
analysis_jobstable (status:pending) - Use FastAPI
BackgroundTasks(or a simple worker loop) to process jobs - Job runner picks up pending jobs, runs all 4 checkers sequentially
- Update job status:
pending→running→completed/failed - Store timestamps:
created_at,started_at,completed_at
-
GET /api/projects/{id}/jobs— list analysis jobs -
GET /api/projects/{id}/jobs/{job_id}— job status + progress - Frontend: polling for job status until completion
Goal: Display analysis results in a useful, filterable report UI.
-
GET /api/projects/{id}/report— full QA report for latest analysis -
GET /api/projects/{id}/report?checker=consistency&severity=critical— filtered - Response structure:
{ "project_id": "...", "job_id": "...", "summary": { "total_issues": 42, "critical": 5, "warning": 20, "info": 17 }, "by_checker": { "untranslated": { "count": 3, "issues": [...] }, "consistency": { "count": 15, "issues": [...] }, "voice": { "count": 12, "issues": [...] }, "tone": { "count": 12, "issues": [...] } } }
- Report dashboard with summary cards (total issues, by severity, by checker)
- Filter controls: by checker type, severity, chapter
- Issue list: expandable cards showing the flagged text, context, suggestion, severity
- Link each issue back to chapter/page/panel for easy reference
- Show "Analysis Running..." state with progress indicator
- Auto-refresh when job completes
- Show job history (past analyses)
Goal: Ensure reliability, create test data, polish the UX.
- Create 2-3 fake manga series JSON files with intentional QA errors:
- Inconsistent character names
- Voice breaks (formal character speaking casually)
- Untranslated Japanese text left in
- Tone mismatches in scenes
- Use these as demo data and for testing
- Unit tests for each checker
- Integration tests for upload → analyze → report pipeline
- API endpoint tests
- Responsive design (mobile-friendly)
- Loading states and error handling
- Empty states (no projects, no reports yet)
- Toast notifications for actions (upload success, analysis started)
Goal: Deploy the full stack to free-tier services.
- Verify production schema and pgvector extension
- Set up Row Level Security if needed
- Confirm connection pooling settings
- Create
Dockerfileorrender.yamlfor FastAPI - Set environment variables on Render (Supabase URL, OpenRouter key)
- Deploy and verify health check endpoint
- Test API endpoints against production DB
- Set up Vercel project linked to GitHub repo
- Configure environment variable for API base URL
- Deploy and verify frontend loads
- Test full flow: create project → upload → analyze → view report
- Smoke test the entire pipeline end-to-end
- Add CORS configuration (Vercel domain → Render API)
- Add basic rate limiting on API
- Write deployment notes in README
- OCR pipeline to extract text from manga images directly
- Source text (Japanese) comparison for translation accuracy checks
- User authentication and multi-user support
- Re-translation suggestions powered by LLM
- Batch analysis across entire volumes
- Export reports as PDF
- Webhook notifications when analysis completes