Readme
SeedVR2 Cog (3B & 7B) 🎥✨
Overview
SeedVR2 Cog packages ByteDance-Seed’s one-step diffusion transformer for both videos and stills. This build hot-swaps between the 3B and 7B checkpoints on a single GPU, adds CDN-friendly weight caching, keeps source audio when returning MP4s, and now exposes an optional wavelet-based colour correction (apply_color_fix) for users who want the same hue preservation as the official Gradio demo.
Give the original research team some love: - Project page — SeedVR2
-
Hugging Face release — ByteDance-Seed/SeedVR2
-
Demo space — SeedVR2 on Hugging Face Spaces
-
Paper — SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
What’s included
-
Dual checkpoints: 3B loads by default; set
model_variant="7b"to bring in the larger model when you have the VRAM. -
Optional colour fix: Flip
apply_color_fix=trueto blend the model’s high-frequency detail with the original colour field. -
Audio passthrough: MP4 outputs inherit the source audio stream when
ffmpegis available (Replicate’s image already includes it). -
Deterministic caching: All large assets download once via Replicate’s CDN with
pget, so versioned builds stay reproducible.
Inputs
| Name | Type | Description | Default |
|---|---|---|---|
media |
file / URL | Video (.mp4, .mov) or image (.png, .jpg, .webp). |
– |
model_variant |
string | "3b" or "7b". 7B provides higher fidelity if your GPU can keep it resident. |
3b |
sample_steps |
int | Diffusion steps (1 = one-pass mode as in the paper). | 1 |
cfg_scale |
float | Guidance strength; >1 sharpens, <1 softens. | 1.0 |
apply_color_fix |
bool | Wavelet colour reconstruction that aligns hues with the input. | false |
sp_size |
int | Leave at 1 for single-GPU runs; higher values only adjust padding. |
1 |
fps |
int | Output frame rate for videos. | 24 |
seed |
int? | Optional deterministic seed. | random |
output_format |
string | Image outputs: "png", "webp", "jpg". |
webp |
output_quality |
int | JPEG/WebP quality when using lossy formats. | 90 |
Tips
-
GPU sizing: 3B fits comfortably on 80 GB cards (A100/H100 80G). The auto dual-load feature preloads both checkpoints only when VRAM ≥120 GB (e.g., H200). Otherwise it stages them between GPU and CPU memory.
-
Colour fix: Leave it off for the legacy look; turn it on to keep input hues on skin tones and skies—especially when the model is aggressively sharpening.
-
Long clips: SeedVR2 was trained up to 121 frames. We automatically pad/truncate beyond that so you don’t have to pre-chunk.
-
Audio: MP4 outputs preserve the original soundtrack via an audio copy step, so you keep sync without re-encoding.
Limitations
-
Heavy motion blur or extreme low light can still stump the model.
-
Over-sharpening can occur on already clean footage—turn cfg_scale down or keep colour fix off if it feels too crunchy.
-
This build is tuned for single-GPU inference; multi-GPU sequence parallel isn’t enabled.
Credits & License
-
Research, training & weights: Jianyi Wang, Shanchuan Lin, Zhijie Lin, Yuxi Ren, Meng Wei, Zongsheng Yue, Shangchen Zhou, Hao Chen, Yang Zhao, Ceyuan Yang, Xuefeng Xiao, Chen Change Loy, Lu Jiang.
-
Upstream GitHub: ByteDance-Seed/SeedVR (https://github.com/ByteDance-Seed/SeedVR) (Apache 2.0).
-
This Cog wrapper: MIT licensed — github.com/zsxkib/cog-ByteDance-Seed-SeedVR2-3B (https://github.com/zsxkib/cog-ByteDance-Seed-SeedVR2-3B)
I just made it behave nicely on Replicate.
⭐ Star the repo on GitHub!
🐦 Follow me on X/Twitter: @zsakib_
💻 More projects: github.com/zsxkib