Video Text Remover
Remove hardcoded text overlays from videos using AI detection and inpainting.
Overview
This model automatically detects and removes hardcoded subtitles, captions, and watermarks from videos using YOLO object detection combined with context-aware inpainting. It preserves video quality while seamlessly removing text, ideal for content localization, re-editing, and accessibility improvements.
Features
- AI-Powered Detection: YOLOv8 model trained specifically for text overlay detection
- Multiple Removal Methods: 6 inpainting algorithms optimized for different use cases
- Resolution Control: Process at lower resolutions for speed, output at original quality
- Temporal Optimization: Skip-frame detection for faster processing
- GPU Accelerated: Automatic CUDA/TensorRT support for 3-10x faster processing
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
video |
file | required | Video file (MP4, AVI, MOV, WebM) |
method |
string | hybrid |
Removal algorithm (see below) |
conf_threshold |
float | 0.25 |
Detection confidence (0.0-1.0). Lower = more detections |
iou_threshold |
float | 0.45 |
NMS threshold for duplicate removal |
margin |
int | 5 |
Extra pixels around detected text (0-20) |
resolution |
string | 720p |
Processing resolution: original, 1080p, 720p, 480p, 360p |
detection_interval |
int | 5 |
Run detection every N frames (0-100). Higher = faster |
Removal Methods
| Method | Description | Best For |
|---|---|---|
hybrid ⭐ |
Context-aware TELEA with expanded region | Complex backgrounds (recommended) |
inpaint |
Fast TELEA algorithm | Simple backgrounds, speed |
inpaint_ns |
Navier-Stokes fluid dynamics | Smooth gradients |
blur |
Gaussian blur (51x51) | Quick previews |
black |
Fill with black pixels | Dark backgrounds |
background |
Fill with surrounding color average | Solid color backgrounds |
Use Cases
- Content Localization: Remove original subtitles to add new translations
- Video Editing: Clean footage for re-editing or remixing
- Accessibility: Replace hardcoded subtitles with proper closed captions
- Archival: Create clean master copies of video content
Model Architecture
- Detection: YOLOv8s ONNX (~9M parameters, 27MB)
- Inpainting: OpenCV TELEA/Navier-Stokes algorithms
- Encoding: FFmpeg H.264 with configurable quality
- Runtime: ONNX Runtime with CUDA/TensorRT/CPU auto-detection
Limitations
- Very small text (<10px) may not be detected reliably
- Semi-transparent overlays are difficult to detect
- Complex backgrounds may show inpainting artifacts
- Audio is not preserved in current version
- 4K+ videos are downscaled for detection, then restored
Ethical Considerations
- Only process content you have rights to modify
- Do not remove copyright notices or watermarks from protected content
- Do not remove creator credits or mandatory attributions
- Disclose when videos have been modified
License
MIT License - see LICENSE for details.
Author
Developed by Helder Lima
Model created
Model updated