CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Setup time can be long as the container is 63GB.
Image prompts are supported thanks to a contribution from nev
Stage 1 output will be a few frames, stage 2 interpolates a longer video and performs dsr resampling.
When running both stages, stage 1 output will render when ready, stage 2 will follow when complete.
Please see the official CogVideo repo for more information: https://github.com/THUDM/CogVideo