zf-kbot/manga-translator

An advanced, automated pipeline for translating manga and comics. This project integrates state-of-the-art Computer Vision and NLP models to perform detection, recognition, translation, and typesetting in a single seamless process.

Public
79 runs

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

AI Manga Translator

An advanced, automated pipeline for translating manga and comics. This project integrates state-of-the-art Computer Vision and NLP models to perform detection, recognition, translation, and typesetting in a single seamless process.

🔗 Live Demo

Experience the full power of this model with a user-friendly interface on our official website: 👉 AI Anime IO - AI Manga Translator


🧠 Model Architecture

This tool operates as a multi-stage pipeline designed to handle the complexities of comic layouts and mixed-language text:

1. 🔍 Speech Bubble Detection (YOLO)

We utilize a fine-tuned YOLO object detection model specifically trained on comic pages. * Function: Accurately locates speech bubbles and text blocks, generating precise bounding boxes. * Advantage: robust against complex backgrounds and varied bubble shapes.

2. 📖 Intelligent OCR Routing

Instead of using a “one-size-fits-all” OCR, the system automatically routes text regions to the most specialized engine based on the source language: * Japanese (ja): Processed by MangaOCR. This is specialized for vertical text, handwritten fonts, and “furigana,” providing superior accuracy for Japanese manga. * Chinese (zh) & Korean (ko): Processed by PaddleOCR. Known for its industry-leading recognition rates for CJK characters. * Latin Languages (en, fr, de): Processed by EasyOCR.

3. 🎨 Text Inpainting (LaMa)

To prepare the canvas for translation, we remove the original text using LaMa (Resolution-robust Large Mask Inpainting). * Process: Generates a soft-feathered mask based on the OCR detection. * Result: The text is erased, and the background is hallucinated/filled in to match the surrounding texture, leaving a clean speech bubble.

4. 🧠 Context-Aware Translation (LLM)

We leverage Large Language Models (LLMs) like GPT-4o and DeepSeek for translation. * Prompt Engineering: The models are prompted to act as professional manga translators. * Style: Focuses on colloquialisms, natural flow, and concise phrasing suitable for limited bubble space.

5. ✍️ Smart Typesetting

The final step involves rendering the translated text back into the image. * Auto-Sizing: Calculates the optimal font size and line wrapping to fit the original bounding box. * Fonts: Uses Noto Sans CJK to ensure all characters (Chinese, Japanese, Korean) are rendered correctly without artifacts. * Style: Adapts text color (Black/White) automatically based on the bubble’s background brightness.


🌍 Supported Languages

Source Languages (Input)

The model is optimized to read: * Japanese (ja) - Best Quality (MangaOCR) * Chinese (zh) * Korean (ko) * English (en) * French (fr) * German (de)

Target Languages (Output)

The model can translate into: * Simplified Chinese (zh-CN) * Traditional Chinese (zh-TW) * English (en) * Japanese (ja) * Korean (ko) * French (fr) * German (de) * Spanish (es) * Italian (it)


📄 License

This project utilizes several open-source components (MangaOCR, PaddleOCR, SimpleLaMa). Please refer to their respective repositories for licensing details.


AI Anime IO

Model created
Model updated