This is a method for generating videos from language descriptions. The video is generated by looping through the given text prompts. Frames are generated around 1 frame per second.
More info here: https://pschaldenbrand.github.io/text2video/
By optimizing the pixels of the video's frames directly, rather than using a pre-trained generator model, this method is near real-time video generation. An image-to-image translation model is used to denoise the frames that were directly optimized.
From the Bot Intelligence Group at Carnegie Mellon University
This method is to be featured at the 2022 NeurIPS Workshop on Machine Learning for Creativity and Design.