Readme
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
Powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text prompt
This model runs on Nvidia A100 (80GB) GPU hardware. We don't yet have enough runs of this model to provide performance information.
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.

Generate short videos from text prompts. Optionally condition on a start image to create image-to-video clips and include up to 4 reference images as scene elements. Choose 5s or 10s duration with 720p output at 30fps, and set aspect ratio to 16:9, 9:16, or 1:1. Supports negative prompts to steer content and returns a video.

Generate short videos from a start image and a text prompt. Produce 5 or 10 second clips at 24 fps in 720p (standard) or 1080p (pro). Optionally supply an end image in pro mode to guide the final frame or interpolate between start and end.

Generate short videos from text prompts or a starting image. Produce 2–12 second clips at 24 fps in up to 1080p resolution across aspect ratios including 16:9, 4:3, 1:1, 3:4, 9:16, 21:9, and 9:21. Guide subjects, style, and multi-character interactions with 1–4 reference images for character, clothing, and environment consistency. Optionally lock the camera, set a random seed for reproducibility, and anchor start/end frames with first- and last-frame images. Outputs a video.
Generate 5–10 second videos from text prompts or a single starting image. Accept a required prompt and optional first-frame image, and output short clips with fluid motion, stable frames, and coherent pacing. Preserve color, lighting, and mood across frames with refined conditioning, and follow multi-step, causal instructions for complex camera moves. Suited for marketing assets, creator shorts, film/animation previz, and educational explainers.

Generate videos with sound from a text prompt or from a reference image plus prompt. Create 4–8 second clips at 720p or 1080p in 16:9 or 9:16, with optional native audio generation. Run fast and cost-efficiently for text-to-video storytelling, product spins, concept shots, and image-to-video animations.

Generate 5–10 second 1080p videos from a text prompt. Provide a start or end image to anchor the first or last frame (one is required) and optionally add up to four reference images as scene elements. Select 16:9, 9:16, or 1:1 aspect ratios and refine results with negative prompts.

Generate 5–10 second 720p videos from a text prompt. Optionally animate from a starting image by using it as the first frame (image-to-video). Select aspect ratios 16:9, 9:16, or 1:1. Outputs silent video.

Generate 480p videos from a text prompt with accelerated inference. Accept a prompt plus optional LoRA weights and scaling to apply styles or characters, with controls for aspect ratio (16:9, 9:16), seed for reproducibility, negative prompt, guidance scale, sampling steps, and flow shift. Include a safety checker with an option to disable it. Output is a silent video.