Readme
Wan 2.7 Image Pro
Generate and edit high-quality images using Alibaba’s Wan 2.7 Pro model. This is the professional variant with 4K support and thinking mode — for faster generation, see wan-video/wan-2.7-image.
What it can do
Text-to-image — Describe what you want and get a high-quality image. Supports resolutions up to 4K (4096×4096) with flexible aspect ratios. Thinking mode enhances the model’s reasoning for better prompt interpretation and image quality.
Image editing — Provide up to 9 reference images along with a text prompt to edit, restyle, or fuse images together. The model can apply style transfer, swap elements between images, and blend multiple references into a single output.
Image set generation — Create a coherent set of related images from a single prompt. Useful for generating the same character across different scenes, product shots from different angles, or storyboard sequences. Generates up to 12 images per request.
Wan 2.7 Image vs Image Pro
| Image | Image Pro | |
|---|---|---|
| Max resolution | 2K (2048×2048) | 4K (4096×4096) |
| Thinking mode | ✔️ | ✔️ |
| 4K text-to-image | ✖ | ✔️ |
| Speed | Faster | Higher quality |
| Image editing | Up to 9 images | Up to 9 images |
| Image sets | Up to 12 | Up to 12 |
Inputs
- prompt — Text description of what you want to generate or how to edit the input images. Supports up to 5,000 characters.
- images — Optional input images for editing, style transfer, or multi-reference generation. Up to 9 images.
- size — Output resolution:
1K(~1024×1024),2K(~2048×2048),4K(~4096×4096, text-to-image only), or custom dimensions. - num_outputs — Number of images to generate (1-4, or 1-12 in image set mode).
- image_set_mode — Enable coherent image set generation.
- thinking_mode — Enhanced reasoning for improved image quality. Enabled by default for text-to-image. Increases generation time.
- seed — For reproducible results.
Tips
- 4K resolution is only available for text-to-image (no input images, no image set mode).
- Thinking mode works best for complex prompts where the model needs to reason about composition, spatial relationships, or multiple elements.
- For image editing, put the editing instruction in the prompt and pass the source image(s) in the
imagesinput. - Image set mode works best with structured prompts that describe each image in the set.