Robust Video Matting (RVM)
English | 中文
Official repository for the paper Robust High-Resolution Video Matting with Temporal Guidance. RVM is specifically designed for robust human video matting. Unlike existing neural models that process frames as independent images, RVM uses a recurrent neural network to process videos with temporal memory. RVM can perform matting in real-time on any videos without additional inputs. It achieves 4K 76FPS and HD 104FPS on an Nvidia GTX 1080 Ti GPU. The project was developed at ByteDance Inc.
Speed is measured with
inference_speed_test.py for reference.
|RTX 2060 Super
|GTX 1080 Ti
- Note 1: HD uses
downsample_ratio=0.25, 4K uses
downsample_ratio=0.125. All tests use batch size 1 and frame chunk 1.
- Note 2: GPUs before Turing architecture does not support FP16 inference, so GTX 1080 Ti uses FP32.
- Note 3: We only measure tensor throughput. The provided video conversion script in this repo is expected to be much slower, because it does not utilize hardware video encoding/decoding and does not have the tensor transfer done on parallel threads. If you are interested in implementing hardware video encoding/decoding in Python, please refer to PyNvCodec.