chenxwh / ltx-video

DiT-based video generation model for generating high-quality videos in real-time

  • Public
  • 3.1K runs
  • L40S
  • GitHub
  • Weights
  • License
Iterate in playground

Input

string
Shift + Return to add a new line

Input prompt

Default: "The waves crash against the jagged rocks of the shoreline. The waves crash against the jagged rocks of the shoreline, sending spray high into the air.The rocks are a dark gray color, with sharp edges and deep crevices. The water is a clear blue-green, with white foam where the waves break against the rocks. The sky is a light gray, with a few white clouds dotting the horizon."

string
Shift + Return to add a new line

Negative prompt for undesired features

Default: "worst quality, inconsistent motion, blurry, jittery, distorted"

file

Optional, input image for image-to-video generation

integer
(maximum: 1280)

Width of the output video frames. Optional if an input image provided

Default: 704

integer
(maximum: 720)

Height of the output video frames. Optional if an input image provided

Default: 480

integer
(maximum: 257)

Number of frames to generate in the output video

Default: 121

integer

Frame rate for the output video

Default: 25

integer
(minimum: 1)

Number of denoising steps

Default: 40

number
(minimum: 1, maximum: 20)

Scale for classifier-free guidance

Default: 3

integer

Random seed. Leave blank to randomize the seed

Output

Generated in

Run time and cost

This model costs approximately $0.012 to run on Replicate, or 83 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 13 seconds. The predict time for this model varies significantly based on the inputs.

Readme

LTX-Video

Introduction

LTX-Video is the first DiT-based video generation model that can generate high-quality videos in real-time. It can generate 24 FPS videos at 768x512 resolution, faster than it takes to watch them. The model is trained on a large-scale dataset of diverse videos and can generate high-resolution videos with realistic and diverse content.## More to come…

Acknowledgement

We are grateful for the following awesome projects when implementing LTX-Video: * DiT and PixArt-alpha: vision transformers for image generation.