Readme
This ONNX-based object detection model exported from Darknet Yolo specializes in identifying and classifying speech bubbles (globes) in manga images. It detects five types of speech bubbles:
normal: Standard speech bubbles
scream: Exclamation or shouting bubbles
touched: Bubbles with motion lines or emphasis
think: Thought bubbles
sentence: Narrative text boxes
The model takes an input image and returns bounding boxes with confidence scores for each detected speech bubble, scaled to the original image dimensions. It uses ONNX Runtime for efficient CPU inference and includes post-processing with Non-Maximum Suppression (NMS) to eliminate overlapping detections.
Key Features:
Optimized for manga-style artwork
5-class classification of speech bubble types
Configurable confidence and IoU thresholds
CPU-based inference (no GPU required)
Fast processing suitable for real-time applications
Use Cases:
Manga translation and localization
Comic book analysis
Automated text extraction from Japanese comics
Content moderation for manga images