A diffusion model for generating human motion video from a text prompt
Want to make some of these yourself?