Readme
AI Manga Translator
An advanced, automated pipeline for translating manga and comics. This project integrates state-of-the-art Computer Vision and NLP models to perform detection, recognition, translation, and typesetting in a single seamless process.
🔗 Live Demo
Experience the full power of this model with a user-friendly interface on our official website: 👉 AI Anime IO - AI Manga Translator
🧠 Model Architecture
This tool operates as a multi-stage pipeline designed to handle the complexities of comic layouts and mixed-language text:
1. 🔍 Speech Bubble Detection (YOLO)
We utilize a fine-tuned YOLO object detection model specifically trained on comic pages. * Function: Accurately locates speech bubbles and text blocks, generating precise bounding boxes. * Advantage: robust against complex backgrounds and varied bubble shapes.
2. 📖 Intelligent OCR Routing
Instead of using a “one-size-fits-all” OCR, the system automatically routes text regions to the most specialized engine based on the source language:
* Japanese (ja): Processed by MangaOCR. This is specialized for vertical text, handwritten fonts, and “furigana,” providing superior accuracy for Japanese manga.
* Chinese (zh) & Korean (ko): Processed by PaddleOCR. Known for its industry-leading recognition rates for CJK characters.
* Latin Languages (en, fr, de): Processed by EasyOCR.
3. 🎨 Text Inpainting (LaMa)
To prepare the canvas for translation, we remove the original text using LaMa (Resolution-robust Large Mask Inpainting). * Process: Generates a soft-feathered mask based on the OCR detection. * Result: The text is erased, and the background is hallucinated/filled in to match the surrounding texture, leaving a clean speech bubble.
4. 🧠 Context-Aware Translation (LLM)
We leverage Large Language Models (LLMs) like GPT-4o and DeepSeek for translation. * Prompt Engineering: The models are prompted to act as professional manga translators. * Style: Focuses on colloquialisms, natural flow, and concise phrasing suitable for limited bubble space.
5. ✍️ Smart Typesetting
The final step involves rendering the translated text back into the image. * Auto-Sizing: Calculates the optimal font size and line wrapping to fit the original bounding box. * Fonts: Uses Noto Sans CJK to ensure all characters (Chinese, Japanese, Korean) are rendered correctly without artifacts. * Style: Adapts text color (Black/White) automatically based on the bubble’s background brightness.
🌍 Supported Languages
Source Languages (Input)
The model is optimized to read:
* Japanese (ja) - Best Quality (MangaOCR)
* Chinese (zh)
* Korean (ko)
* English (en)
* French (fr)
* German (de)
Target Languages (Output)
The model can translate into:
* Simplified Chinese (zh-CN)
* Traditional Chinese (zh-TW)
* English (en)
* Japanese (ja)
* Korean (ko)
* French (fr)
* German (de)
* Spanish (es)
* Italian (it)
📄 License
This project utilizes several open-source components (MangaOCR, PaddleOCR, SimpleLaMa). Please refer to their respective repositories for licensing details.