Readme
KEEP: Kalman-Inspired Video Face Super-Resolution 🚀
This Replicate model lets you run KEEP (Kalman-Inspired FEaturE Propagation), an advanced algorithm designed to beautifully restore and enhance faces in videos. It’s based on the research paper “Kalman-Inspired FEaturE Propagation for Video Face Super-Resolution” (Feng et al., ECCV 2024).
This implementation is packaged with Cog, making it easy to run and integrate. It intelligently enhances faces, frame by frame, ensuring temporal consistency for smooth and natural-looking results.
Original Project Page: github.com/jnjaby/KEEP
About the KEEP Model
KEEP tackles the challenge of video face super-resolution by drawing inspiration from Kalman filtering. This allows the model to effectively propagate facial features and details across video frames. The result is a robust restoration process that can handle various head poses, expressions, and lighting conditions, leading to significantly improved visual quality in your videos.
This Cog implementation focuses on providing a straightforward way to apply KEEP to your videos, with options to also enhance backgrounds and further upscale faces using RealESRGAN.
Key Features ✨
- Advanced Face Restoration: Leverages the KEEP model for high-quality face super-resolution.
- Temporally Consistent Results: Ensures smooth transitions and feature stability across video frames.
- Flexible Face Detection: Choose from multiple detection models (RetinaFace, YOLOv5 variants) to best suit your footage.
- Optional Background Enhancement: Enhance the entire video background using RealESRGAN.
- Optional Face Upsampling: Further upscale the restored faces for even greater detail.
- Handles Aligned & Unaligned Video: Works with both pre-aligned/cropped face videos (512x512) and regular footage requiring detection.
- Control Over Processing: Options to process only the center face, draw bounding boxes, and adjust upscaling factors.
How It Works (The Gist) 💡
When you provide a video, this KEEP model goes through several steps:
- Frame Extraction: The input video is broken down into individual frames.
- Face Detection & Alignment (if needed): If your video isn’t pre-aligned (has_aligned=false), it uses a face detection model (likeretinaface_resnet50) from Facelib to find faces and their landmarks in each frame. These landmarks are then smoothed over time to ensure stable alignment.
- Face Cropping: Faces are cropped and warped to a 512x512 resolution based on the detected (or provided) landmarks.
- KEEP Restoration: The core KEEP network processes these cropped faces, enhancing their details and quality. It processes the video in segments (max_length) to manage memory.
- Pasting Back: The restored 512x512 faces are pasted back into their original positions in the video frames.
- Optional Enhancements:- If bg_enhancementis on, the background of each frame is enhanced by RealESRGAN.
- If face_upsampleis on, the restored face itself is further upscaled by RealESRGAN.
 
- If 
- Video Re-assembly: The processed frames are combined back into a video file.
The predict.py script orchestrates this entire pipeline, from loading models to processing frames and saving the output.
Potential Use Cases 🚀
- Restore old or low-quality family videos: Bring clarity to cherished memories.
- Enhance footage from webcams or online meetings: Improve the visual quality of interviews or presentations.
- Upscale faces in documentaries or archival footage: Make historical content more accessible.
- Improve user-generated video content: Give a polished look to vlogs or social media clips.
- Pre-processing for other AI tasks: Enhance faces before applying further analysis or effects.
Things to Keep in Mind ⚠️
- Input Quality Matters: While KEEP is robust, the better the input video quality, the better the restoration results will generally be.
- Processing Time: Video processing can be time-consuming, especially for longer videos or when background enhancement is enabled.
- Detection Model Choice: The detection_modelparameter can impact results, especially for challenging footage.retinaface_resnet50is a good default, but others might perform better in specific scenarios.
- Multiple Faces: The only_center_faceparameter controls behavior when multiple faces are present. If disabled, it typically processes the largest detected face per frame based on the underlying helper library logic.
- has_aligned: Use this option carefully. If your video isn’t truly pre-aligned 512x512 face crops, results will be poor.
License & Disclaimer 📜
The KEEP model and its original associated code are licensed under the S-Lab License 1.0. Components like RealESRGAN and Facelib models have their own open-source licenses; please refer to their respective project pages for details.
The code in the zsxkib/cog-KEEP GitHub repository for packaging this model with Cog is released under the MIT License. You can find it here.
Please use this model responsibly and in accordance with any terms of use from the original model creators.
Citation 📚
If you use KEEP in your research, please consider citing the original paper:
@InProceedings{feng2024keep,
      title     = {Kalman-Inspired FEaturE Propagation for Video Face Super-Resolution},
      author    = {Feng, Ruicheng and Li, Chongyi and Loy, Chen Change},
      booktitle = {European Conference on Computer Vision (ECCV)},
      year      = {2024}
}
Cog implementation managed by zsxkib.
⭐ Star the Cog repo on GitHub: zsxkib/cog-KEEP
👋 Follow me on Twitter/X
