zf-kbot/manga-translator

An advanced, automated pipeline for translating manga and comics. This project integrates state-of-the-art Computer Vision and NLP models to perform detection, recognition, translation, and typesetting in a single seamless process.

Public
79 runs

AI Manga Translator

An advanced, automated pipeline for translating manga and comics. This project integrates state-of-the-art Computer Vision and NLP models to perform detection, recognition, translation, and typesetting in a single seamless process.

🔗 Live Demo

Experience the full power of this model with a user-friendly interface on our official website: 👉 AI Anime IO - AI Manga Translator


🧠 Model Architecture

This tool operates as a multi-stage pipeline designed to handle the complexities of comic layouts and mixed-language text:

1. 🔍 Speech Bubble Detection (YOLO)

We utilize a fine-tuned YOLO object detection model specifically trained on comic pages. * Function: Accurately locates speech bubbles and text blocks, generating precise bounding boxes. * Advantage: robust against complex backgrounds and varied bubble shapes.

2. 📖 Intelligent OCR Routing

Instead of using a “one-size-fits-all” OCR, the system automatically routes text regions to the most specialized engine based on the source language: * Japanese (ja): Processed by MangaOCR. This is specialized for vertical text, handwritten fonts, and “furigana,” providing superior accuracy for Japanese manga. * Chinese (zh) & Korean (ko): Processed by PaddleOCR. Known for its industry-leading recognition rates for CJK characters. * Latin Languages (en, fr, de): Processed by EasyOCR.

3. 🎨 Text Inpainting (LaMa)

To prepare the canvas for translation, we remove the original text using LaMa (Resolution-robust Large Mask Inpainting). * Process: Generates a soft-feathered mask based on the OCR detection. * Result: The text is erased, and the background is hallucinated/filled in to match the surrounding texture, leaving a clean speech bubble.

4. 🧠 Context-Aware Translation (LLM)

We leverage Large Language Models (LLMs) like GPT-4o and DeepSeek for translation. * Prompt Engineering: The models are prompted to act as professional manga translators. * Style: Focuses on colloquialisms, natural flow, and concise phrasing suitable for limited bubble space.

5. ✍️ Smart Typesetting

The final step involves rendering the translated text back into the image. * Auto-Sizing: Calculates the optimal font size and line wrapping to fit the original bounding box. * Fonts: Uses Noto Sans CJK to ensure all characters (Chinese, Japanese, Korean) are rendered correctly without artifacts. * Style: Adapts text color (Black/White) automatically based on the bubble’s background brightness.


🌍 Supported Languages

Source Languages (Input)

The model is optimized to read: * Japanese (ja) - Best Quality (MangaOCR) * Chinese (zh) * Korean (ko) * English (en) * French (fr) * German (de)

Target Languages (Output)

The model can translate into: * Simplified Chinese (zh-CN) * Traditional Chinese (zh-TW) * English (en) * Japanese (ja) * Korean (ko) * French (fr) * German (de) * Spanish (es) * Italian (it)


📄 License

This project utilizes several open-source components (MangaOCR, PaddleOCR, SimpleLaMa). Please refer to their respective repositories for licensing details.


AI Anime IO

Model created
Model updated