zsxkib/seedvr2

🔥 SeedVR2: one-step video & image restoration with 3B/7B hot‑swap and optional color fix 🎬✨

Public
3.8K runs

Run time and cost

This model costs approximately $0.0072 to run on Replicate, or 138 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia H100 GPU hardware. Predictions typically complete within 5 seconds. The predict time for this model varies significantly based on the inputs.

Readme

SeedVR2 Cog (3B & 7B) 🎥✨

Overview

SeedVR2 Cog packages ByteDance-Seed’s one-step diffusion transformer for both videos and stills. This build hot-swaps between the 3B and 7B checkpoints on a single GPU, adds CDN-friendly weight caching, keeps source audio when returning MP4s, and now exposes an optional wavelet-based colour correction (apply_color_fix) for users who want the same hue preservation as the official Gradio demo.

Give the original research team some love: - Project page — SeedVR2

What’s included

  • Dual checkpoints: 3B loads by default; set model_variant="7b" to bring in the larger model when you have the VRAM.

  • Optional colour fix: Flip apply_color_fix=true to blend the model’s high-frequency detail with the original colour field.

  • Audio passthrough: MP4 outputs inherit the source audio stream when ffmpeg is available (Replicate’s image already includes it).

  • Deterministic caching: All large assets download once via Replicate’s CDN with pget, so versioned builds stay reproducible.

Inputs

Name Type Description Default
media file / URL Video (.mp4, .mov) or image (.png, .jpg, .webp).
model_variant string "3b" or "7b". 7B provides higher fidelity if your GPU can keep it resident. 3b
sample_steps int Diffusion steps (1 = one-pass mode as in the paper). 1
cfg_scale float Guidance strength; >1 sharpens, <1 softens. 1.0
apply_color_fix bool Wavelet colour reconstruction that aligns hues with the input. false
sp_size int Leave at 1 for single-GPU runs; higher values only adjust padding. 1
fps int Output frame rate for videos. 24
seed int? Optional deterministic seed. random
output_format string Image outputs: "png", "webp", "jpg". webp
output_quality int JPEG/WebP quality when using lossy formats. 90

Tips

  • GPU sizing: 3B fits comfortably on 80 GB cards (A100/H100 80G). The auto dual-load feature preloads both checkpoints only when VRAM ≥120 GB (e.g., H200). Otherwise it stages them between GPU and CPU memory.

  • Colour fix: Leave it off for the legacy look; turn it on to keep input hues on skin tones and skies—especially when the model is aggressively sharpening.

  • Long clips: SeedVR2 was trained up to 121 frames. We automatically pad/truncate beyond that so you don’t have to pre-chunk.

  • Audio: MP4 outputs preserve the original soundtrack via an audio copy step, so you keep sync without re-encoding.

Limitations

  • Heavy motion blur or extreme low light can still stump the model.

  • Over-sharpening can occur on already clean footage—turn cfg_scale down or keep colour fix off if it feels too crunchy.

  • This build is tuned for single-GPU inference; multi-GPU sequence parallel isn’t enabled.

Credits & License

I just made it behave nicely on Replicate.


⭐ Star the repo on GitHub!
🐦 Follow me on X/Twitter: @zsakib_
💻 More projects: github.com/zsxkib