hjunior29/video-text-remover

Clean videos by automatically removing text overlays

Public
193 runs

Run time and cost

This model runs on CPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Video Text Remover

Remove hardcoded text overlays from videos using AI detection and inpainting.

Overview

This model automatically detects and removes hardcoded subtitles, captions, and watermarks from videos using YOLO object detection combined with context-aware inpainting. It preserves video quality while seamlessly removing text, ideal for content localization, re-editing, and accessibility improvements.

Features

  • AI-Powered Detection: YOLOv8 model trained specifically for text overlay detection
  • Multiple Removal Methods: 6 inpainting algorithms optimized for different use cases
  • Resolution Control: Process at lower resolutions for speed, output at original quality
  • Temporal Optimization: Skip-frame detection for faster processing
  • GPU Accelerated: Automatic CUDA/TensorRT support for 3-10x faster processing

Input Parameters

Parameter Type Default Description
video file required Video file (MP4, AVI, MOV, WebM)
method string hybrid Removal algorithm (see below)
conf_threshold float 0.25 Detection confidence (0.0-1.0). Lower = more detections
iou_threshold float 0.45 NMS threshold for duplicate removal
margin int 5 Extra pixels around detected text (0-20)
resolution string 720p Processing resolution: original, 1080p, 720p, 480p, 360p
detection_interval int 5 Run detection every N frames (0-100). Higher = faster

Removal Methods

Method Description Best For
hybrid Context-aware TELEA with expanded region Complex backgrounds (recommended)
inpaint Fast TELEA algorithm Simple backgrounds, speed
inpaint_ns Navier-Stokes fluid dynamics Smooth gradients
blur Gaussian blur (51x51) Quick previews
black Fill with black pixels Dark backgrounds
background Fill with surrounding color average Solid color backgrounds

Use Cases

  • Content Localization: Remove original subtitles to add new translations
  • Video Editing: Clean footage for re-editing or remixing
  • Accessibility: Replace hardcoded subtitles with proper closed captions
  • Archival: Create clean master copies of video content

Model Architecture

  • Detection: YOLOv8s ONNX (~9M parameters, 27MB)
  • Inpainting: OpenCV TELEA/Navier-Stokes algorithms
  • Encoding: FFmpeg H.264 with configurable quality
  • Runtime: ONNX Runtime with CUDA/TensorRT/CPU auto-detection

Limitations

  • Very small text (<10px) may not be detected reliably
  • Semi-transparent overlays are difficult to detect
  • Complex backgrounds may show inpainting artifacts
  • Audio is not preserved in current version
  • 4K+ videos are downscaled for detection, then restored

Ethical Considerations

  • Only process content you have rights to modify
  • Do not remove copyright notices or watermarks from protected content
  • Do not remove creator credits or mandatory attributions
  • Disclose when videos have been modified

License

MIT License - see LICENSE for details.

Author

Developed by Helder Lima

Model created
Model updated