lightricks/ltx-video | Run with an API on Replicate

lightricks / ltx-video

LTX-Video is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 24 FPS videos at a 768x512 resolution faster than they can be watched.

Cold

Public
150.9K runs
L40S
Commercial use
GitHub
Weights
License

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

prompt

string

Shift + Return to add a new line

A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage.A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage.

Text prompt for the video. This model needs long descriptive prompts, if the prompt is too short the quality won't be good.

Default: "best quality, 4k, HDR, a tracking shot of a beautiful scene"

negative_prompt

string

Shift + Return to add a new line

low quality, worst quality, deformed, distorted, watermarklow quality, worst quality, deformed, distorted, watermark

Things you do not want to see in your video

Default: "low quality, worst quality, deformed, distorted"

image

file

Optional input image to use as the starting frame

image_noise_scale

number

(minimum: 0, maximum: 1)

Lower numbers stick more closely to the input image

Default: 0.15

target_size

integer

Target size for the output video

Default: 640

aspect_ratio

string

Aspect ratio of the output video. Ignored if an image is provided.

Default: "3:2"

cfg

number

(minimum: 1, maximum: 20)

How strongly the video follows the prompt

Default: 3

steps

integer

(minimum: 1, maximum: 50)

Number of steps

Default: 30

length

integer

Length of the output video in frames

Default: 97

model

string

Model version to use

Default: "0.9.1"

seed

integer

Set a seed for reproducibility. Random by default.

Run this model in Node.js with one line of code:

npx create-replicate --model=lightricks/ltx-video

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run lightricks/ltx-video using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "lightricks/ltx-video:8c47da666861d081eeb4d1261853087de23923a268a69b63febdf5dc1dee08e4",
  {
    input: {
      cfg: 3,
      model: "0.9.1",
      steps: 30,
      length: 97,
      prompt: "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage.",
      target_size: 640,
      aspect_ratio: "16:9",
      negative_prompt: "low quality, worst quality, deformed, distorted, watermark",
      image_noise_scale: 0.15
    }
  }
);

// To access the file URL:
console.log(output[0].url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output[0]);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run lightricks/ltx-video using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "lightricks/ltx-video:8c47da666861d081eeb4d1261853087de23923a268a69b63febdf5dc1dee08e4",
    input={
        "cfg": 3,
        "model": "0.9.1",
        "steps": 30,
        "length": 97,
        "prompt": "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage.",
        "target_size": 640,
        "aspect_ratio": "16:9",
        "negative_prompt": "low quality, worst quality, deformed, distorted, watermark",
        "image_noise_scale": 0.15
    }
)

# To access the file URL:
print(output[0].url())
#=> "http://example.com"

# To write the file to disk:
with open("my-image.png", "wb") as file:
    file.write(output[0].read())

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run lightricks/ltx-video using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "lightricks/ltx-video:8c47da666861d081eeb4d1261853087de23923a268a69b63febdf5dc1dee08e4",
    "input": {
      "cfg": 3,
      "model": "0.9.1",
      "steps": 30,
      "length": 97,
      "prompt": "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair\'s face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage.",
      "target_size": 640,
      "aspect_ratio": "16:9",
      "negative_prompt": "low quality, worst quality, deformed, distorted, watermark",
      "image_noise_scale": 0.15
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2024-11-29T15:01:27.086103Z",
  "created_at": "2024-11-29T15:01:15.274000Z",
  "data_removed": false,
  "error": null,
  "id": "2g50w9dzh9rj20ckf1rbdbqdfm",
  "input": {
    "cfg": 3,
    "steps": 30,
    "length": 97,
    "prompt": "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage.",
    "target_size": 640,
    "aspect_ratio": "16:9",
    "negative_prompt": "low quality, worst quality, deformed, distorted, watermark"
  },
  "logs": "Random seed set to: 2624027272\nChecking inputs\n====================================\nRunning workflow\n[ComfyUI] got prompt\nExecuting node 85, title: Width and height from aspect ratio 🪴, class type: Width and height from aspect ratio 🪴\nExecuting node 84, title: EmptyLTXVLatentVideo, class type: EmptyLTXVLatentVideo\nExecuting node 71, title: LTXVScheduler, class type: LTXVScheduler\nExecuting node 72, title: SamplerCustom, class type: SamplerCustom\n[ComfyUI]\n[ComfyUI] 0%|          | 0/30 [00:00<?, ?it/s]\n[ComfyUI] 3%|▎         | 1/30 [00:00<00:04,  6.37it/s]\n[ComfyUI] 7%|▋         | 2/30 [00:00<00:07,  3.95it/s]\n[ComfyUI] 10%|█         | 3/30 [00:00<00:07,  3.52it/s]\n[ComfyUI] 13%|█▎        | 4/30 [00:01<00:07,  3.34it/s]\n[ComfyUI] 17%|█▋        | 5/30 [00:01<00:07,  3.25it/s]\n[ComfyUI] 20%|██        | 6/30 [00:01<00:07,  3.19it/s]\n[ComfyUI] 23%|██▎       | 7/30 [00:02<00:07,  3.16it/s]\n[ComfyUI] 27%|██▋       | 8/30 [00:02<00:07,  3.14it/s]\n[ComfyUI] 30%|███       | 9/30 [00:02<00:06,  3.12it/s]\n[ComfyUI] 33%|███▎      | 10/30 [00:03<00:06,  3.12it/s]\n[ComfyUI] 37%|███▋      | 11/30 [00:03<00:06,  3.11it/s]\n[ComfyUI] 40%|████      | 12/30 [00:03<00:05,  3.10it/s]\n[ComfyUI] 43%|████▎     | 13/30 [00:04<00:05,  3.10it/s]\n[ComfyUI] 47%|████▋     | 14/30 [00:04<00:05,  3.10it/s]\n[ComfyUI] 50%|█████     | 15/30 [00:04<00:04,  3.10it/s]\n[ComfyUI] 53%|█████▎    | 16/30 [00:05<00:04,  3.10it/s]\n[ComfyUI] 57%|█████▋    | 17/30 [00:05<00:04,  3.09it/s]\n[ComfyUI] 60%|██████    | 18/30 [00:05<00:03,  3.09it/s]\n[ComfyUI] 63%|██████▎   | 19/30 [00:05<00:03,  3.09it/s]\n[ComfyUI] 67%|██████▋   | 20/30 [00:06<00:03,  3.09it/s]\n[ComfyUI] 70%|███████   | 21/30 [00:06<00:02,  3.09it/s]\n[ComfyUI] 73%|███████▎  | 22/30 [00:06<00:02,  3.09it/s]\n[ComfyUI] 77%|███████▋  | 23/30 [00:07<00:02,  3.09it/s]\n[ComfyUI] 80%|████████  | 24/30 [00:07<00:01,  3.09it/s]\n[ComfyUI] 83%|████████▎ | 25/30 [00:07<00:01,  3.09it/s]\n[ComfyUI] 87%|████████▋ | 26/30 [00:08<00:01,  3.09it/s]\n[ComfyUI] 90%|█████████ | 27/30 [00:08<00:00,  3.09it/s]\n[ComfyUI] 93%|█████████▎| 28/30 [00:08<00:00,  3.09it/s]\n[ComfyUI] 97%|█████████▋| 29/30 [00:09<00:00,  3.09it/s]\n[ComfyUI] 100%|██████████| 30/30 [00:09<00:00,  3.09it/s]\nExecuting node 8, title: VAE Decode, class type: VAEDecode\nExecuting node 79, title: Video Combine 🎥🅥🅗🅢, class type: VHS_VideoCombine\n[ComfyUI] 100%|██████████| 30/30 [00:09<00:00,  3.15it/s]\n[ComfyUI] Prompt executed in 11.64 seconds\noutputs:  {'79': {'gifs': [{'filename': 'R8_LTX_00001.mp4', 'subfolder': '', 'type': 'output', 'format': 'video/h264-mp4', 'frame_rate': 25.0}]}}\n====================================\nR8_LTX_00001.png\nR8_LTX_00001.mp4",
  "metrics": {
    "predict_time": 11.803572214999999,
    "total_time": 11.812103
  },
  "output": [
    "https://replicate.delivery/yhqm/WjvukHay2258P9UFGYkGiMKr7exu9eeSs27782SR7fBcUiXPB/R8_LTX_00001.mp4"
  ],
  "started_at": "2024-11-29T15:01:15.282530Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://stream.replicate.com/v1/files/qoxq-xwt7eftpueuzod263mpuyywetfgjvpuodiv2xac2oc6wn6blbqyq",
    "get": "https://api.replicate.com/v1/predictions/2g50w9dzh9rj20ckf1rbdbqdfm",
    "cancel": "https://api.replicate.com/v1/predictions/2g50w9dzh9rj20ckf1rbdbqdfm/cancel"
  },
  "version": "5ddec822499d46d11a93a92ef87e26adefda6608279d9d35c454e50e5e298d92"
}

Generated in

11.8 seconds

Tweak it Upscale ShareReport View full prediction

Random seed set to: 2624027272
Checking inputs
====================================
Running workflow
[ComfyUI] got prompt
Executing node 85, title: Width and height from aspect ratio 🪴, class type: Width and height from aspect ratio 🪴
Executing node 84, title: EmptyLTXVLatentVideo, class type: EmptyLTXVLatentVideo
Executing node 71, title: LTXVScheduler, class type: LTXVScheduler
Executing node 72, title: SamplerCustom, class type: SamplerCustom
[ComfyUI]
[ComfyUI] 0%|          | 0/30 [00:00<?, ?it/s]
[ComfyUI] 3%|▎         | 1/30 [00:00<00:04,  6.37it/s]
[ComfyUI] 7%|▋         | 2/30 [00:00<00:07,  3.95it/s]
[ComfyUI] 10%|█         | 3/30 [00:00<00:07,  3.52it/s]
[ComfyUI] 13%|█▎        | 4/30 [00:01<00:07,  3.34it/s]
[ComfyUI] 17%|█▋        | 5/30 [00:01<00:07,  3.25it/s]
[ComfyUI] 20%|██        | 6/30 [00:01<00:07,  3.19it/s]
[ComfyUI] 23%|██▎       | 7/30 [00:02<00:07,  3.16it/s]
[ComfyUI] 27%|██▋       | 8/30 [00:02<00:07,  3.14it/s]
[ComfyUI] 30%|███       | 9/30 [00:02<00:06,  3.12it/s]
[ComfyUI] 33%|███▎      | 10/30 [00:03<00:06,  3.12it/s]
[ComfyUI] 37%|███▋      | 11/30 [00:03<00:06,  3.11it/s]
[ComfyUI] 40%|████      | 12/30 [00:03<00:05,  3.10it/s]
[ComfyUI] 43%|████▎     | 13/30 [00:04<00:05,  3.10it/s]
[ComfyUI] 47%|████▋     | 14/30 [00:04<00:05,  3.10it/s]
[ComfyUI] 50%|█████     | 15/30 [00:04<00:04,  3.10it/s]
[ComfyUI] 53%|█████▎    | 16/30 [00:05<00:04,  3.10it/s]
[ComfyUI] 57%|█████▋    | 17/30 [00:05<00:04,  3.09it/s]
[ComfyUI] 60%|██████    | 18/30 [00:05<00:03,  3.09it/s]
[ComfyUI] 63%|██████▎   | 19/30 [00:05<00:03,  3.09it/s]
[ComfyUI] 67%|██████▋   | 20/30 [00:06<00:03,  3.09it/s]
[ComfyUI] 70%|███████   | 21/30 [00:06<00:02,  3.09it/s]
[ComfyUI] 73%|███████▎  | 22/30 [00:06<00:02,  3.09it/s]
[ComfyUI] 77%|███████▋  | 23/30 [00:07<00:02,  3.09it/s]
[ComfyUI] 80%|████████  | 24/30 [00:07<00:01,  3.09it/s]
[ComfyUI] 83%|████████▎ | 25/30 [00:07<00:01,  3.09it/s]
[ComfyUI] 87%|████████▋ | 26/30 [00:08<00:01,  3.09it/s]
[ComfyUI] 90%|█████████ | 27/30 [00:08<00:00,  3.09it/s]
[ComfyUI] 93%|█████████▎| 28/30 [00:08<00:00,  3.09it/s]
[ComfyUI] 97%|█████████▋| 29/30 [00:09<00:00,  3.09it/s]
[ComfyUI] 100%|██████████| 30/30 [00:09<00:00,  3.09it/s]
Executing node 8, title: VAE Decode, class type: VAEDecode
Executing node 79, title: Video Combine 🎥🅥🅗🅢, class type: VHS_VideoCombine
[ComfyUI] 100%|██████████| 30/30 [00:09<00:00,  3.15it/s]
[ComfyUI] Prompt executed in 11.64 seconds
outputs:  {'79': {'gifs': [{'filename': 'R8_LTX_00001.mp4', 'subfolder': '', 'type': 'output', 'format': 'video/h264-mp4', 'frame_rate': 25.0}]}}
====================================
R8_LTX_00001.png
R8_LTX_00001.mp4

This output was created using a different version of the model, lightricks/ltx-video:5ddec822.

Examples

View more examples

Run time and cost

This model costs approximately $0.042 to run on Replicate, or 23 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 44 seconds. The predict time for this model varies significantly based on the inputs.

Readme

LTX-Video by Lightricks

This model card focuses on the model associated with the LTX-Video model, codebase available here.

LTX-Video is the first DiT-based video generation model capable of generating high-quality videos in real-time. It produces 24 FPS videos at a 768x512 resolution faster than they can be watched. Trained on a large-scale dataset of diverse videos, the model generates high-resolution videos with realistic and varied content. We provide a model for both text-to-video as well as image+text-to-video usecases

Model Details

Developed by: Lightricks
Model type: Diffusion-based text-to-video and image-to-video generation model
Language(s): English

Usage

Direct use

You can use the model for purposes under the license

General tips:

The model works on resolutions that are divisible by 32 and number of frames that are divisible by 8 + 1 (e.g. 257). In case the resolution or number of frames are not divisible by 32 or 8 + 1, the input will be padded with -1 and then cropped to the desired resolution and number of frames.
The model works best on resolutions under 720 x 1280 and number of frames below 257.
Prompts should be in English. The more elaborate the better. Good prompt looks like The turquoise waves crash against the dark, jagged rocks of the shore, sending white foam spraying into the air. The scene is dominated by the stark contrast between the bright blue water and the dark, almost black rocks. The water is a clear, turquoise color, and the waves are capped with white foam. The rocks are dark and jagged, and they are covered in patches of green moss. The shore is lined with lush green vegetation, including trees and bushes. In the background, there are rolling hills covered in dense forest. The sky is cloudy, and the light is dim.

ComfyUI

To use our model with ComfyUI, please follow the instructions at a dedicated ComfyUI repo.

Limitations

This model is not intended or able to provide factual information.
As a statistical model this checkpoint might amplify existing societal biases.
The model may fail to generate videos that matches the prompts perfectly.
Prompt following is heavily influenced by the prompting-style.