Readme
OmniHuman 1.5
OmniHuman 1.5 produces character-driven video by combining an input image, audio, and optional prompt text. Compared to earlier versions, it adds:
- Support for text prompts.
- Unrestricted camera and character motion.
- Audio-aware action generation for logical and expressive video behavior.
Capabilities
- Audio comprehension – character behavior and expressions follow audio semantics.
- Camera and character control – supports multiple, sequential actions and free camera movement.
- Emotion performance – recognizes and performs nuanced emotions and micro-expressions.
- Multi-character scenes – specify who speaks and manage background reactions.
- Diverse subjects – supports humans, animals, and stylized or animated characters.
Typical Use Cases
| Scenario | Description |
|---|---|
| Film & TV / Short Video | Character dialogue, dramatic and emotional scenes, narrative shots. |
| Fantasy Vlog | Realistic or surreal selfie-style recordings with controllable events and dynamics. |
| AI Music Video | Rhythm-driven actions, expressive camera motion, music emotion alignment. |
| UGC / Creative | Stylized or non-human avatars, pixel-style content, creative virtual scenes. |
Prompt Writing Guide
Core principles
- Write prompts as short, natural storylines.
- Focus on dynamic actions, not static attributes already in the image.
- Use clear, step-by-step, non-contradictory language.
Recommended structure
[Camera movement] + [Emotion] + [Speaking state] + [Specific actions] + [Optional background actions]
Example
> “The camera slowly moves from the side to a medium front shot. > A young woman sits by the window, calm, smiling as she talks to the camera. > A boy beside her looks at her, then turns to the camera and smiles.”
Tips
- Include verbs like talks or sings to improve lip-sync.
- Use sequence words (first, then) for multi-step actions.
- Avoid long absences of the subject from frame (may break continuity).
- High-resolution, clear input images yield better results.