chenxwh / diffsynth-exvideo

Extended video synthesis model that generates 128 frames

  • Public
  • 203 runs
  • A100 (80GB)
  • GitHub
  • Paper
  • License

Input

string
Shift + Return to add a new line

Input prompt

Default: "bonfire, on the stone"

string
Shift + Return to add a new line

Specify things to not see in the output

Default: "错误的眼睛,糟糕的人脸,毁容,糟糕的艺术,变形,多余的肢体,模糊的颜色,模糊,重复,病态,残缺,"

integer
(maximum: 128)

Number of the output frames

Default: 128

integer
(minimum: 1, maximum: 500)

Number of denoising steps for image and video generation

Default: 25

integer
(minimum: 1, maximum: 500)

Number of denoising steps for upscaling the video

Default: 25

integer

Random seed. Leave blank to randomize the seed

Output

image

image

video

upscale_video

Generated in

Run time and cost

This model costs approximately $1.11 to run on Replicate, or 0 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 14 minutes.

Readme

DiffSynth Studio

Introduction

DiffSynth Studio is a Diffusion engine. We have restructured architectures including Text Encoder, UNet, VAE, among others, maintaining compatibility with models from the open-source community while enhancing computational performance. We provide many interesting features. Enjoy the magic of Diffusion models!

This demo supports ExVideo

Long Video Synthesis

We trained an extended video synthesis model, which can generate 128 frames

https://github.com/modelscope/DiffSynth-Studio/assets/35051019/d97f6aa9-8064-4b5b-9d49-ed6001bb9acc