A diffusion model for generating human motion video from a text prompt (Updated 2 years, 4 months ago)