cjwbw / videocrafter

Text-to-Video Generation and Editing

Demo API Examples Versions (3a7e6cdc)


View more examples

Run time and cost

Predictions run on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 4 minutes. The predict time for this model varies significantly based on the inputs.

VideoCrafter:A Toolkit for Text-to-Video Generation and Editing

🤗🤗🤗 VideoCrafter is an open-source video generation and editing toolbox for crafting video content.
It currently includes the following THREE types of models:

1. Base T2V: Generic Text-to-video Generation

We provide a base text-to-video (T2V) generation model based on the latent video diffusion models (LVDM). It can synthesize realistic videos based on the input text descriptions.

"Campfire at night in a snowy forest with starry sky in the background." "Cars running on the highway at night." "close up of a clown fish swimming. 4K" "astronaut riding a horse"

2. VideoLoRA: Personalized Text-to-Video Generation with LoRA

Based on the pretrained LVDM, we can create our own video generation models by finetuning it on a set of video clips or images describing a certain concept.

We adopt LoRA to implement the finetuning as it is easy to train and requires fewer computational resources.

Below are generation results from our four VideoLoRA models that are trained on four different styles of video clips.

By providing a sentence describing the video content along with a LoRA trigger word (specified during LoRA training), it can generate videos with the desired style(or subject/concept).

Results of inputting A monkey is playing a piano, ${trigger_word} to the four VideoLoRA models:

"Loving Vincent style" "frozenmovie style" "MakotoShinkaiYourName style" "coco style"

The trigger word for each VideoLoRA is annotated below the generation result.