Robust Video Matting (RVM)
English | 中文
Official repository for the paper Robust High-Resolution Video Matting with Temporal Guidance. RVM is specifically designed for robust human video matting. Unlike existing neural models that process frames as independent images, RVM uses a recurrent neural network to process videos with temporal memory. RVM can perform matting in real-time on any videos without additional inputs. It achieves 4K 76FPS and HD 104FPS on an Nvidia GTX 1080 Ti GPU. The project was developed at ByteDance Inc.
Showreel
Watch the showreel video (YouTube, Bilibili) to see the model’s performance.
Speed
Speed is measured with inference_speed_test.py
for reference.
GPU | dType | HD (1920x1080) | 4K (3840x2160) |
---|---|---|---|
RTX 3090 | FP16 | 172 FPS | 154 FPS |
RTX 2060 Super | FP16 | 134 FPS | 108 FPS |
GTX 1080 Ti | FP32 | 104 FPS | 74 FPS |
- Note 1: HD uses
downsample_ratio=0.25
, 4K usesdownsample_ratio=0.125
. All tests use batch size 1 and frame chunk 1. - Note 2: GPUs before Turing architecture does not support FP16 inference, so GTX 1080 Ti uses FP32.
- Note 3: We only measure tensor throughput. The provided video conversion script in this repo is expected to be much slower, because it does not utilize hardware video encoding/decoding and does not have the tensor transfer done on parallel threads. If you are interested in implementing hardware video encoding/decoding in Python, please refer to PyNvCodec.