This repository contains the research paper and supporting materials for AutoScan, an automated system that leverages computer vision and machine learning to streamline manga (Japanese comic book) translation and editing workflows. The system processes Japanese manga pages, detects speech bubbles and text, translates content into English, and seamlessly integrates translations while preserving the original artistic style and font characteristics through intelligent bubble classification.
As Machine Learning becomes ever more ubiquitous, it offers promise in automating many previously time-consuming processes, including making literature more accessible. This work focuses on the challenges facing the manga community regarding translation and widespread dissemination of thousands of available comics. Not only is translating and editing manga time-consuming, but translated copies are often unofficial and may be subject to individuality in translation as opposed to an official and standard translation.
AutoScan addresses these challenges by introducing an end-to-end automated pipeline that:
- Detects Speech Bubbles: Uses YOLOv8 to classify 7 types of speech bubbles for font consistency
- Extracts Text: Employs Manga OCR for accurate Japanese character recognition
- Translates Content: Utilizes DeepL for high-quality Japanese-to-English translation
- Preserves Context: Maintains varying fonts to preserve visual meaning and sentiment
This research demonstrates improvements in translation efficiency while maintaining the visual context that lends additional meaning to the original story.
The AutoScan pipeline consists of four main steps:
- Speech Bubble Detection: Locates text within frames using YOLOv8 by Ultralytics
- Text Extraction: Extracts text from frames using Manga OCR
- Translation: Translates extracted text using DeepL translator
- Image Editing: Edits frames to include new translations with appropriate fonts
AutoScan-Research-Paper/
├── README.md # This file
├── paper/
│ ├── AutoScan_Paper.pdf # Complete research paper
│ ├── abstract.md # Extended abstract
│ └── references.bib # Bibliography
├── figures/
│ ├── system_architecture.png
│ ├── detection_pipeline.png
│ └── results_comparison.png
├── examples/
│ ├── input/ # Sample manga pages (input)
│ └── output/ # Translated pages (output)
├── methodology/
│ └── detailed_methods.md # Detailed methodology
└── LICENSE
- Uses YOLOv8 object detection model by Ultralytics
- Classifies 7 types of speech bubbles to maintain font consistency:
- Regular speech bubbles
- Shout bubbles
- Narrative bubbles
- Happy bubbles
- Evil bubbles
- Thought bubbles
- Scared bubbles
- Achieved 0.724 mAP at IoU 0.50
- Utilizes Manga OCR (based on Microsoft's TrOCR)
- Trained specifically for manga text recognition
- Handles Japanese text including Furigana (pronunciation guides)
- Works well with text on top of images
- Uses DeepL translator for accurate Japanese-to-English translation
- Currently translates bubble-by-bubble
- Future versions will incorporate contextual translation
- Employs Adobe's generative fill tool for image editing
- Removes Japanese text and extends backgrounds naturally
- Future versions will automate using Google's Vertex AI
AutoScan demonstrates:
- 0.724 mAP (mean Average Precision) at IoU 0.50 for speech bubble detection
- 7 bubble types classified for font consistency maintenance
- Manga OCR performance superior to Tesseract, EasyOCR, and Google Cloud Vision
- Font preservation through bubble type classification enables context maintenance
- Automated translation pipeline from detection through editing
The system was trained and tested on:
- 1,062 manga page images annotated using Roboflow
- Transfer learning applied to adapt YOLOv8 for manga-specific detection
- 7 speech bubble classes for font variation tracking
The system employs a four-step processing pipeline:
- Speech Bubble Detection: YOLOv8 model locates and classifies speech bubbles, saving (x, y) coordinates for each bounding box
- Text Extraction: Manga OCR extracts Japanese text from cropped bubble regions
- Translation: DeepL translates the extracted text to English
- Image Editing: Adobe's generative fill removes Japanese text and extends backgrounds, then translated text is overlaid with appropriate fonts based on bubble type
For detailed methodology, see methodology/detailed_methods.md.
Below are sample comparisons showing original Japanese manga pages and their AutoScan-translated English versions:
| Original (Japanese) | AutoScan Output (English) |
|---|---|
![]() |
![]() |
More examples available in the examples/ directory
This research makes the following contributions:
- Literature Summary: Summarizes leading contributions to automatic manga translation
- AutoScan Pipeline: Presents an automatic manga translator based on compilation of leading contributions
- Font Consistency: Implements font consistency using multiple bubble classes (7 types)
- Generative Fill: Explores generative fill technology for improving the cleaning process before translation
- Future Directions: Identifies current challenges and future directions for automatic manga translation
This research paper was developed by a team from Embry-Riddle Aeronautical University:
- Kino Leonhart - Department of Electrical Engineering and Computer Science
- Lynn Vonderhaar - Department of Electrical Engineering and Computer Science
- Juan Couder - Department of Electrical Engineering and Computer Science
- Charles Walker - Department of Electrical Engineering and Computer Science
- Omar Ochoa - Department of Electrical Engineering and Computer Science
My Contributions: I reviewed and edited significant portions of the paper. I research similar works and assisted in writing the Challenges and Future Directions section.
- Speech bubble ordering for context-aware translation of complete scenes
- Character tracking to build character personas for better translation context
- Emotion analysis combined with character tracking for improved translations
- Automated generative fill using Google's Vertex AI
- LLM integration for character persona creation and fan fiction generation
- Improved handling of Furigana-heavy text
- Text mask tracking for precise cleaning
If you use this research in your work, please cite:
@inproceedings{leonhart2024autoscan,
title={AutoScan: Leveraging Computer Vision to Improve Comic Book Translations and Editing},
author={Leonhart, Kino and Vonderhaar, Lynn and Couder, Juan and Walker, Charles and Ochoa, Omar},
booktitle={IEEE Conference Proceedings},
year={2024},
organization={Embry-Riddle Aeronautical University}
}This project is licensed under the MIT License - see the LICENSE file for details.
We thank our peer reviewers and collaborators for their valuable feedback and contributions to this research..
For questions, suggestions, or collaboration opportunities, please see authors' contact information.
This research aims to support and enhance the manga translation community, not replace human translators. The system is designed as a tool to improve efficiency while preserving the artistic and cultural integrity of the original works.

