Readme

Cog pipeline for XMem and ProPainter

This is a generative AI pipeline that combines two models:

XMem, a model for video object segmentation
ProPainter, a model for video inpainting

This pipeline can be used for easy video inpainting. XMem turns a source video and an annotated first frame into a video mask. ProPainter takes a source video and a video mask and fills everything under the mask with inpainting.

How to use it

Here’s how you can use this pipeline to do video inpainting on a source video, for example kitten_short.mp4.

1. Extract the first frame of your video.

XMem needs an annotated first video frame to create a video mask for ProPainter. To make this annotated frame, you can start by extracting the frames from your source video with ffmpeg:

mkdir frames
ffmpeg -i kitten_short.mp4 frames/%04d.jpg

2. Create a mask of the first frame

You can then use an image segmentation model, such as Segment Anything, to turn the first frame, frames/0001.jpg, into a mask.

3. Feed the source video and the mask into the pipeline

We can now feed our video kitten_short.mp4 and first_frame_mask.png into the model pipeline on this page! Just upload your source video under ‘video’, and the masked first frame under ‘mask’.

XMem will generate a video mask from the inputs. ProPainter will take XMem’s output, and use it for video inpainting.

Licenses

The cog files have the MIT license. For the license of the underlying models, please see their respective repositories on Github.