Predictions run on Nvidia A100 GPU hardware. Predictions typically complete within 32 minutes. The predict time for this model varies significantly based on the inputs.
Image prompts are supported thanks to a contribution from nev
Stage 1 output will be a few frames, stage 2 interpolates a longer video and performs dsr resampling.
When running both stages, stage 1 output will render when ready, stage 2 will follow when complete.
Please see the official CogVideo repo for more information: