jd7h/open-sora-512 | Run with an API on Replicate

jd7h / open-sora-512

Open-Sora: Democratizing Efficient Video Production for All. This is the 16x512x512 video generation variant.

Cold

Public
560 runs
L40S
GitHub
Paper
License

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

prompt

*string

Shift + Return to add a new line

The vibrant beauty of a sunflower field. The sunflowers, with their bright yellow petals and dark brown centers, are in full bloom, creating a stunning contrast against the green leaves and stems. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. The sun is shining brightly, casting a warm glow on the flowers and highlighting their intricate details. The video is shot from a low angle, looking up at the sunflowers, which adds a sense of grandeur and awe to the scene. The sunflowers are the main focus of the video, with no other objects or people present. The video is a celebration of nature's beauty and the simple joy of a sunny day in the countryside.The vibrant beauty of a sunflower field. The sunflowers, with their bright yellow petals and dark brown centers, are in full bloom, creating a stunning contrast against the green leaves and stems. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. The sun is shining brightly, casting a warm glow on the flowers and highlighting their intricate details. The video is shot from a low angle, looking up at the sunflowers, which adds a sense of grandeur and awe to the scene. The sunflowers are the main focus of the video, with no other objects or people present. The video is a celebration of nature's beauty and the simple joy of a sunny day in the countryside.

Prompt for the video

seed

integer

Seed. Leave blank to randomise

Run this model in Node.js with one line of code:

npx create-replicate --model=jd7h/open-sora-512

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run jd7h/open-sora-512 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "jd7h/open-sora-512:053f96f5e48a544e281cd110479f0af5913bef226bbc74409b803bd788f3485d",
  {
    input: {
      prompt: "The vibrant beauty of a sunflower field. The sunflowers, with their bright yellow petals and dark brown centers, are in full bloom, creating a stunning contrast against the green leaves and stems. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. The sun is shining brightly, casting a warm glow on the flowers and highlighting their intricate details. The video is shot from a low angle, looking up at the sunflowers, which adds a sense of grandeur and awe to the scene. The sunflowers are the main focus of the video, with no other objects or people present. The video is a celebration of nature's beauty and the simple joy of a sunny day in the countryside."
    }
  }
);

// To access the file URL:
console.log(output[0].url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output[0]);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run jd7h/open-sora-512 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "jd7h/open-sora-512:053f96f5e48a544e281cd110479f0af5913bef226bbc74409b803bd788f3485d",
    input={
        "prompt": "The vibrant beauty of a sunflower field. The sunflowers, with their bright yellow petals and dark brown centers, are in full bloom, creating a stunning contrast against the green leaves and stems. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. The sun is shining brightly, casting a warm glow on the flowers and highlighting their intricate details. The video is shot from a low angle, looking up at the sunflowers, which adds a sense of grandeur and awe to the scene. The sunflowers are the main focus of the video, with no other objects or people present. The video is a celebration of nature's beauty and the simple joy of a sunny day in the countryside."
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run jd7h/open-sora-512 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "jd7h/open-sora-512:053f96f5e48a544e281cd110479f0af5913bef226bbc74409b803bd788f3485d",
    "input": {
      "prompt": "The vibrant beauty of a sunflower field. The sunflowers, with their bright yellow petals and dark brown centers, are in full bloom, creating a stunning contrast against the green leaves and stems. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. The sun is shining brightly, casting a warm glow on the flowers and highlighting their intricate details. The video is shot from a low angle, looking up at the sunflowers, which adds a sense of grandeur and awe to the scene. The sunflowers are the main focus of the video, with no other objects or people present. The video is a celebration of nature\'s beauty and the simple joy of a sunny day in the countryside."
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2024-03-25T16:13:41.326017Z",
  "created_at": "2024-03-25T16:03:02.978878Z",
  "data_removed": false,
  "error": null,
  "id": "3i6x2wlbrf47pittnc6d2camju",
  "input": {
    "prompt": "The vibrant beauty of a sunflower field. The sunflowers, with their bright yellow petals and dark brown centers, are in full bloom, creating a stunning contrast against the green leaves and stems. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. The sun is shining brightly, casting a warm glow on the flowers and highlighting their intricate details. The video is shot from a low angle, looking up at the sunflowers, which adds a sense of grandeur and awe to the scene. The sunflowers are the main focus of the video, with no other objects or people present. The video is a celebration of nature's beauty and the simple joy of a sunny day in the countryside."
  },
  "logs": "Using seed 964085829...\n  0%|          | 0/100 [00:00<?, ?it/s]\n  1%|          | 1/100 [00:00<01:24,  1.17it/s]\n  2%|▏         | 2/100 [00:01<01:17,  1.26it/s]\n  3%|▎         | 3/100 [00:02<01:15,  1.29it/s]\n  4%|▍         | 4/100 [00:03<01:13,  1.30it/s]\n  5%|▌         | 5/100 [00:03<01:12,  1.31it/s]\n  6%|▌         | 6/100 [00:04<01:11,  1.32it/s]\n  7%|▋         | 7/100 [00:05<01:10,  1.32it/s]\n  8%|▊         | 8/100 [00:06<01:09,  1.33it/s]\n  9%|▉         | 9/100 [00:06<01:08,  1.33it/s]\n 10%|█         | 10/100 [00:07<01:07,  1.33it/s]\n 11%|█         | 11/100 [00:08<01:06,  1.33it/s]\n 12%|█▏        | 12/100 [00:09<01:06,  1.33it/s]\n 13%|█▎        | 13/100 [00:09<01:05,  1.33it/s]\n 14%|█▍        | 14/100 [00:10<01:04,  1.33it/s]\n 15%|█▌        | 15/100 [00:11<01:03,  1.33it/s]\n 16%|█▌        | 16/100 [00:12<01:03,  1.33it/s]\n 17%|█▋        | 17/100 [00:12<01:02,  1.33it/s]\n 18%|█▊        | 18/100 [00:13<01:01,  1.33it/s]\n 19%|█▉        | 19/100 [00:14<01:00,  1.33it/s]\n 20%|██        | 20/100 [00:15<01:00,  1.33it/s]\n 21%|██        | 21/100 [00:15<00:59,  1.33it/s]\n 22%|██▏       | 22/100 [00:16<00:58,  1.33it/s]\n 23%|██▎       | 23/100 [00:17<00:58,  1.33it/s]\n 24%|██▍       | 24/100 [00:18<00:57,  1.33it/s]\n 25%|██▌       | 25/100 [00:18<00:56,  1.33it/s]\n 26%|██▌       | 26/100 [00:19<00:55,  1.33it/s]\n 27%|██▋       | 27/100 [00:20<00:55,  1.33it/s]\n 28%|██▊       | 28/100 [00:21<00:54,  1.33it/s]\n 29%|██▉       | 29/100 [00:21<00:53,  1.33it/s]\n 30%|███       | 30/100 [00:22<00:52,  1.33it/s]\n 31%|███       | 31/100 [00:23<00:52,  1.32it/s]\n 32%|███▏      | 32/100 [00:24<00:51,  1.32it/s]\n 33%|███▎      | 33/100 [00:24<00:50,  1.32it/s]\n 34%|███▍      | 34/100 [00:25<00:49,  1.32it/s]\n 35%|███▌      | 35/100 [00:26<00:49,  1.32it/s]\n 36%|███▌      | 36/100 [00:27<00:48,  1.32it/s]\n 37%|███▋      | 37/100 [00:27<00:47,  1.32it/s]\n 38%|███▊      | 38/100 [00:28<00:46,  1.32it/s]\n 39%|███▉      | 39/100 [00:29<00:46,  1.32it/s]\n 40%|████      | 40/100 [00:30<00:45,  1.32it/s]\n 41%|████      | 41/100 [00:31<00:44,  1.32it/s]\n 42%|████▏     | 42/100 [00:31<00:43,  1.32it/s]\n 43%|████▎     | 43/100 [00:32<00:43,  1.32it/s]\n 44%|████▍     | 44/100 [00:33<00:42,  1.32it/s]\n 45%|████▌     | 45/100 [00:34<00:41,  1.32it/s]\n 46%|████▌     | 46/100 [00:34<00:40,  1.32it/s]\n 47%|████▋     | 47/100 [00:35<00:40,  1.32it/s]\n 48%|████▊     | 48/100 [00:36<00:39,  1.32it/s]\n 49%|████▉     | 49/100 [00:37<00:38,  1.32it/s]\n 50%|█████     | 50/100 [00:37<00:37,  1.32it/s]\n 51%|█████     | 51/100 [00:38<00:37,  1.32it/s]\n 52%|█████▏    | 52/100 [00:39<00:36,  1.32it/s]\n 53%|█████▎    | 53/100 [00:40<00:35,  1.32it/s]\n 54%|█████▍    | 54/100 [00:40<00:34,  1.32it/s]\n 55%|█████▌    | 55/100 [00:41<00:34,  1.32it/s]\n 56%|█████▌    | 56/100 [00:42<00:33,  1.32it/s]\n 57%|█████▋    | 57/100 [00:43<00:32,  1.32it/s]\n 58%|█████▊    | 58/100 [00:43<00:31,  1.32it/s]\n 59%|█████▉    | 59/100 [00:44<00:31,  1.32it/s]\n 60%|██████    | 60/100 [00:45<00:30,  1.32it/s]\n 61%|██████    | 61/100 [00:46<00:29,  1.32it/s]\n 62%|██████▏   | 62/100 [00:46<00:28,  1.32it/s]\n 63%|██████▎   | 63/100 [00:47<00:28,  1.32it/s]\n 64%|██████▍   | 64/100 [00:48<00:27,  1.32it/s]\n 65%|██████▌   | 65/100 [00:49<00:26,  1.32it/s]\n 66%|██████▌   | 66/100 [00:49<00:25,  1.32it/s]\n 67%|██████▋   | 67/100 [00:50<00:25,  1.32it/s]\n 68%|██████▊   | 68/100 [00:51<00:24,  1.32it/s]\n 69%|██████▉   | 69/100 [00:52<00:23,  1.32it/s]\n 70%|███████   | 70/100 [00:53<00:22,  1.32it/s]\n 71%|███████   | 71/100 [00:53<00:21,  1.32it/s]\n 72%|███████▏  | 72/100 [00:54<00:21,  1.32it/s]\n 73%|███████▎  | 73/100 [00:55<00:20,  1.32it/s]\n 74%|███████▍  | 74/100 [00:56<00:19,  1.32it/s]\n 75%|███████▌  | 75/100 [00:56<00:18,  1.32it/s]\n 76%|███████▌  | 76/100 [00:57<00:18,  1.32it/s]\n 77%|███████▋  | 77/100 [00:58<00:17,  1.32it/s]\n 78%|███████▊  | 78/100 [00:59<00:16,  1.32it/s]\n 79%|███████▉  | 79/100 [00:59<00:15,  1.32it/s]\n 80%|████████  | 80/100 [01:00<00:15,  1.32it/s]\n 81%|████████  | 81/100 [01:01<00:14,  1.32it/s]\n 82%|████████▏ | 82/100 [01:02<00:13,  1.32it/s]\n 83%|████████▎ | 83/100 [01:02<00:12,  1.32it/s]\n 84%|████████▍ | 84/100 [01:03<00:12,  1.32it/s]\n 85%|████████▌ | 85/100 [01:04<00:11,  1.32it/s]\n 86%|████████▌ | 86/100 [01:05<00:10,  1.32it/s]\n 87%|████████▋ | 87/100 [01:05<00:09,  1.32it/s]\n 88%|████████▊ | 88/100 [01:06<00:09,  1.32it/s]\n 89%|████████▉ | 89/100 [01:07<00:08,  1.32it/s]\n 90%|█████████ | 90/100 [01:08<00:07,  1.32it/s]\n 91%|█████████ | 91/100 [01:08<00:06,  1.32it/s]\n 92%|█████████▏| 92/100 [01:09<00:06,  1.32it/s]\n 93%|█████████▎| 93/100 [01:10<00:05,  1.32it/s]\n 94%|█████████▍| 94/100 [01:11<00:04,  1.32it/s]\n 95%|█████████▌| 95/100 [01:11<00:03,  1.32it/s]\n 96%|█████████▌| 96/100 [01:12<00:03,  1.31it/s]\n 97%|█████████▋| 97/100 [01:13<00:02,  1.31it/s]\n 98%|█████████▊| 98/100 [01:14<00:01,  1.31it/s]\n 99%|█████████▉| 99/100 [01:15<00:00,  1.32it/s]\n100%|██████████| 100/100 [01:15<00:00,  1.32it/s]\n100%|██████████| 100/100 [01:15<00:00,  1.32it/s]\nPrompt: The vibrant beauty of a sunflower field. The sunflowers, with their bright yellow petals and dark brown centers, are in full bloom, creating a stunning contrast against the green leaves and stems. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. The sun is shining brightly, casting a warm glow on the flowers and highlighting their intricate details. The video is shot from a low angle, looking up at the sunflowers, which adds a sense of grandeur and awe to the scene. The sunflowers are the main focus of the video, with no other objects or people present. The video is a celebration of nature's beauty and the simple joy of a sunny day in the countryside.\nSaved to ./outputs/samples/sample_0.mp4",
  "metrics": {
    "predict_time": 78.995378,
    "total_time": 638.347139
  },
  "output": [
    "https://replicate.delivery/pbxt/2RrnUAZXV7bRPxi9PseiufF1H8CNhfv0BwZJyIyBddOolqHlA/sample_0.mp4"
  ],
  "started_at": "2024-03-25T16:12:22.330639Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/3i6x2wlbrf47pittnc6d2camju",
    "cancel": "https://api.replicate.com/v1/predictions/3i6x2wlbrf47pittnc6d2camju/cancel"
  },
  "version": "1355733d9918ee0a738dfe12a936a0b36d7a36cff5e74c0c255dba6b03ba19a8"
}

Generated in

79.0 seconds

Tweak it Report View full prediction

Using seed 964085829...
  0%|          | 0/100 [00:00<?, ?it/s]
  1%|          | 1/100 [00:00<01:24,  1.17it/s]
  2%|▏         | 2/100 [00:01<01:17,  1.26it/s]
  3%|▎         | 3/100 [00:02<01:15,  1.29it/s]
  4%|▍         | 4/100 [00:03<01:13,  1.30it/s]
  5%|▌         | 5/100 [00:03<01:12,  1.31it/s]
  6%|▌         | 6/100 [00:04<01:11,  1.32it/s]
  7%|▋         | 7/100 [00:05<01:10,  1.32it/s]
  8%|▊         | 8/100 [00:06<01:09,  1.33it/s]
  9%|▉         | 9/100 [00:06<01:08,  1.33it/s]
 10%|█         | 10/100 [00:07<01:07,  1.33it/s]
 11%|█         | 11/100 [00:08<01:06,  1.33it/s]
 12%|█▏        | 12/100 [00:09<01:06,  1.33it/s]
 13%|█▎        | 13/100 [00:09<01:05,  1.33it/s]
 14%|█▍        | 14/100 [00:10<01:04,  1.33it/s]
 15%|█▌        | 15/100 [00:11<01:03,  1.33it/s]
 16%|█▌        | 16/100 [00:12<01:03,  1.33it/s]
 17%|█▋        | 17/100 [00:12<01:02,  1.33it/s]
 18%|█▊        | 18/100 [00:13<01:01,  1.33it/s]
 19%|█▉        | 19/100 [00:14<01:00,  1.33it/s]
 20%|██        | 20/100 [00:15<01:00,  1.33it/s]
 21%|██        | 21/100 [00:15<00:59,  1.33it/s]
 22%|██▏       | 22/100 [00:16<00:58,  1.33it/s]
 23%|██▎       | 23/100 [00:17<00:58,  1.33it/s]
 24%|██▍       | 24/100 [00:18<00:57,  1.33it/s]
 25%|██▌       | 25/100 [00:18<00:56,  1.33it/s]
 26%|██▌       | 26/100 [00:19<00:55,  1.33it/s]
 27%|██▋       | 27/100 [00:20<00:55,  1.33it/s]
 28%|██▊       | 28/100 [00:21<00:54,  1.33it/s]
 29%|██▉       | 29/100 [00:21<00:53,  1.33it/s]
 30%|███       | 30/100 [00:22<00:52,  1.33it/s]
 31%|███       | 31/100 [00:23<00:52,  1.32it/s]
 32%|███▏      | 32/100 [00:24<00:51,  1.32it/s]
 33%|███▎      | 33/100 [00:24<00:50,  1.32it/s]
 34%|███▍      | 34/100 [00:25<00:49,  1.32it/s]
 35%|███▌      | 35/100 [00:26<00:49,  1.32it/s]
 36%|███▌      | 36/100 [00:27<00:48,  1.32it/s]
 37%|███▋      | 37/100 [00:27<00:47,  1.32it/s]
 38%|███▊      | 38/100 [00:28<00:46,  1.32it/s]
 39%|███▉      | 39/100 [00:29<00:46,  1.32it/s]
 40%|████      | 40/100 [00:30<00:45,  1.32it/s]
 41%|████      | 41/100 [00:31<00:44,  1.32it/s]
 42%|████▏     | 42/100 [00:31<00:43,  1.32it/s]
 43%|████▎     | 43/100 [00:32<00:43,  1.32it/s]
 44%|████▍     | 44/100 [00:33<00:42,  1.32it/s]
 45%|████▌     | 45/100 [00:34<00:41,  1.32it/s]
 46%|████▌     | 46/100 [00:34<00:40,  1.32it/s]
 47%|████▋     | 47/100 [00:35<00:40,  1.32it/s]
 48%|████▊     | 48/100 [00:36<00:39,  1.32it/s]
 49%|████▉     | 49/100 [00:37<00:38,  1.32it/s]
 50%|█████     | 50/100 [00:37<00:37,  1.32it/s]
 51%|█████     | 51/100 [00:38<00:37,  1.32it/s]
 52%|█████▏    | 52/100 [00:39<00:36,  1.32it/s]
 53%|█████▎    | 53/100 [00:40<00:35,  1.32it/s]
 54%|█████▍    | 54/100 [00:40<00:34,  1.32it/s]
 55%|█████▌    | 55/100 [00:41<00:34,  1.32it/s]
 56%|█████▌    | 56/100 [00:42<00:33,  1.32it/s]
 57%|█████▋    | 57/100 [00:43<00:32,  1.32it/s]
 58%|█████▊    | 58/100 [00:43<00:31,  1.32it/s]
 59%|█████▉    | 59/100 [00:44<00:31,  1.32it/s]
 60%|██████    | 60/100 [00:45<00:30,  1.32it/s]
 61%|██████    | 61/100 [00:46<00:29,  1.32it/s]
 62%|██████▏   | 62/100 [00:46<00:28,  1.32it/s]
 63%|██████▎   | 63/100 [00:47<00:28,  1.32it/s]
 64%|██████▍   | 64/100 [00:48<00:27,  1.32it/s]
 65%|██████▌   | 65/100 [00:49<00:26,  1.32it/s]
 66%|██████▌   | 66/100 [00:49<00:25,  1.32it/s]
 67%|██████▋   | 67/100 [00:50<00:25,  1.32it/s]
 68%|██████▊   | 68/100 [00:51<00:24,  1.32it/s]
 69%|██████▉   | 69/100 [00:52<00:23,  1.32it/s]
 70%|███████   | 70/100 [00:53<00:22,  1.32it/s]
 71%|███████   | 71/100 [00:53<00:21,  1.32it/s]
 72%|███████▏  | 72/100 [00:54<00:21,  1.32it/s]
 73%|███████▎  | 73/100 [00:55<00:20,  1.32it/s]
 74%|███████▍  | 74/100 [00:56<00:19,  1.32it/s]
 75%|███████▌  | 75/100 [00:56<00:18,  1.32it/s]
 76%|███████▌  | 76/100 [00:57<00:18,  1.32it/s]
 77%|███████▋  | 77/100 [00:58<00:17,  1.32it/s]
 78%|███████▊  | 78/100 [00:59<00:16,  1.32it/s]
 79%|███████▉  | 79/100 [00:59<00:15,  1.32it/s]
 80%|████████  | 80/100 [01:00<00:15,  1.32it/s]
 81%|████████  | 81/100 [01:01<00:14,  1.32it/s]
 82%|████████▏ | 82/100 [01:02<00:13,  1.32it/s]
 83%|████████▎ | 83/100 [01:02<00:12,  1.32it/s]
 84%|████████▍ | 84/100 [01:03<00:12,  1.32it/s]
 85%|████████▌ | 85/100 [01:04<00:11,  1.32it/s]
 86%|████████▌ | 86/100 [01:05<00:10,  1.32it/s]
 87%|████████▋ | 87/100 [01:05<00:09,  1.32it/s]
 88%|████████▊ | 88/100 [01:06<00:09,  1.32it/s]
 89%|████████▉ | 89/100 [01:07<00:08,  1.32it/s]
 90%|█████████ | 90/100 [01:08<00:07,  1.32it/s]
 91%|█████████ | 91/100 [01:08<00:06,  1.32it/s]
 92%|█████████▏| 92/100 [01:09<00:06,  1.32it/s]
 93%|█████████▎| 93/100 [01:10<00:05,  1.32it/s]
 94%|█████████▍| 94/100 [01:11<00:04,  1.32it/s]
 95%|█████████▌| 95/100 [01:11<00:03,  1.32it/s]
 96%|█████████▌| 96/100 [01:12<00:03,  1.31it/s]
 97%|█████████▋| 97/100 [01:13<00:02,  1.31it/s]
 98%|█████████▊| 98/100 [01:14<00:01,  1.31it/s]
 99%|█████████▉| 99/100 [01:15<00:00,  1.32it/s]
100%|██████████| 100/100 [01:15<00:00,  1.32it/s]
100%|██████████| 100/100 [01:15<00:00,  1.32it/s]
Prompt: The vibrant beauty of a sunflower field. The sunflowers, with their bright yellow petals and dark brown centers, are in full bloom, creating a stunning contrast against the green leaves and stems. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. The sun is shining brightly, casting a warm glow on the flowers and highlighting their intricate details. The video is shot from a low angle, looking up at the sunflowers, which adds a sense of grandeur and awe to the scene. The sunflowers are the main focus of the video, with no other objects or people present. The video is a celebration of nature's beauty and the simple joy of a sunny day in the countryside.
Saved to ./outputs/samples/sample_0.mp4

This output was created using a different version of the model, jd7h/open-sora-512:1355733d.

Examples

View more examples

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

This demo implements the 16x512x512 inference demo from the Open-Sora readme.

Open-Sora: Democratizing Efficient Video Production for All

We present Open-Sora, an initiative dedicated to efficiently produce high-quality video and make the model, tools and contents accessible to all. By embracing open-source principles, Open-Sora not only democratizes access to advanced video generation techniques, but also offers a streamlined and user-friendly platform that simplifies the complexities of video production. With Open-Sora, we aim to inspire innovation, creativity, and inclusivity in the realm of content creation.

Open-Sora is still at an early stage and under active development.

📰 News

[2024.03.18] 🔥 We release Open-Sora 1.0, a fully open-source project for video generation. Open-Sora 1.0 supports a full pipeline of video data preprocessing, training with colossal ai acceleration, inference, and more. Our provided checkpoints can produce 2s 512x512 videos with only 3 days training.
[2024.03.04] Open-Sora provides training with 46% cost reduction.

🔆 New Features/Updates

📍 Open-Sora-v1 released. Model weights are available here. With only 400K video clips and 200 H800 days (compared with 152M samples in Stable Video Diffusion), we are able to generate 2s 512×512 videos.
✅ Three stages training from an image diffusion model to a video diffusion model. We provide the weights for each stage.
✅ Support training acceleration including accelerated transformer, faster T5 and VAE, and sequence parallelism. Open-Sora improve 55% training speed when training on 64x512x512 videos. Details locates at acceleration.md.
✅ We provide data preprocessing pipeline, including downloading, video cutting, and captioning tools. Our data collection plan can be found at datasets.md.
✅ We find VQ-VAE from VideoGPT has a low quality and thus adopt a better VAE from Stability-AI. We also find patching in the time dimension deteriorates the quality. See our report for more discussions.
✅ We investigate different architectures including DiT, Latte, and our proposed STDiT. Our STDiT achieves a better trade-off between quality and speed. See our report for more discussions.
✅ Support clip and T5 text conditioning.
✅ By viewing images as one-frame videos, our project supports training DiT on both images and videos (e.g., ImageNet & UCF101). See command.md for more instructions.
✅ Support inference with official weights from DiT, Latte, and PixArt.

Model Weights

Resolution	Data	#iterations	Batch Size	GPU days (H800)	URL
16×256×256	366K	80k	8×64	117	:link:
16×256×256	20K HQ	24k	8×64	45	:link:
16×512×512	20K HQ	20k	2×64	35	:link:

Our model’s weight is partially initialized from PixArt-α. The number of parameters is 724M. More information about training can be found in our report. More about dataset can be found in dataset.md. HQ means high quality.

LIMITATION: Our model is trained on a limited budget. The quality and text alignment is relatively poor. The model performs badly especially on generating human beings and cannot follow detailed instructions. We are working on improving the quality and text alignment.

Acknowledgement

DiT: Scalable Diffusion Models with Transformers.
OpenDiT: An acceleration for DiT training. We adopt valuable acceleration strategies for training progress from OpenDiT.
PixArt: An open-source DiT-based text-to-image model.
Latte: An attempt to efficiently train DiT for video.
StabilityAI VAE: A powerful image VAE model.
CLIP: A powerful text-image embedding model.
T5: A powerful text encoder.
LLaVA: A powerful image captioning model based on Yi-34B.

We are grateful for their exceptional work and generous contribution to open source.

Citation

@software{opensora,
  author = {Zangwei Zheng and Xiangyu Peng and Yang You},
  title = {Open-Sora: Democratizing Efficient Video Production for All},
  month = {March},
  year = {2024},
  url = {https://github.com/hpcaitech/Open-Sora}
}

Zangwei Zheng and Xiangyu Peng equally contributed to this work during their internship at HPC-AI Tech.