pschaldenbrand / text2video

Method for generating bizarre looking videos from a series of language descriptions of the video. From the Bot Intelligence Group at CMU: Peter Schaldenbrand, Zhixuan Liu, & Jean Oh

  • Public
  • 8.4K runs
  • T4
  • GitHub
  • Paper
  • License

Input

*string
Shift + Return to add a new line

Text descriptions separated by &

number

How much frame-to-frame changes. 100 = tons. 0 = barely.

Default: 30

integer

Video width in pixels

Default: 640

integer

Video height in pixels

Default: 360

integer

How many video frames to dedicate to each given prompt.

Default: 20

integer

Frames per second of output video

Default: 8

boolean

Faster video generation at the cost of some quality

Default: true

Output

Generated in

Run time and cost

This model costs approximately $0.039 to run on Replicate, or 25 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 3 minutes. The predict time for this model varies significantly based on the inputs.

Readme

This is a method for generating videos from language descriptions. The video is generated by looping through the given text prompts. Frames are generated around 1 frame per second.

More info here: https://pschaldenbrand.github.io/text2video/

Fast Text2Video

By optimizing the pixels of the video’s frames directly, rather than using a pre-trained generator model, this method is near real-time video generation. An image-to-image translation model is used to denoise the frames that were directly optimized.

From the Bot Intelligence Group at Carnegie Mellon University

This method is to be featured at the 2022 NeurIPS Workshop on Machine Learning for Creativity and Design.