Qwen-Image-2512
Qwen-Image-2512 is the December 2024 update to Alibaba Cloud’s text-to-image model. It generates photorealistic images with improved human rendering, natural textures, and accurate text—especially in Chinese.
After over 10,000 rounds of blind evaluation on AI Arena, Qwen-Image-2512 ranks as the strongest open-source image generation model, while staying competitive with closed-source systems.
What’s improved
More realistic people
The model dramatically reduces the “AI-generated” look in human portraits. It captures age-appropriate details like wrinkles, skin texture, and natural facial features. Hair strands are rendered individually instead of blurred together, and subtle expressions come through more naturally.
Finer natural detail
Landscapes, animal fur, and organic textures are rendered with significantly more detail. The model better captures the complexity of natural surfaces and environmental context.
Better text rendering
Text quality has improved across the board—better layout, more accurate character rendering, and more faithful composition when mixing text and images. This is especially strong for Chinese text, which has thousands of complex characters that need pixel-perfect accuracy.
Improved prompt following
The model better follows semantic instructions in your prompts. If you specify “body leaning slightly forward,” the model actually captures that posture. Details in your prompt translate more reliably to the final image.
Example outputs
Here are some examples showing what the model can do:
Photorealistic portraits
The model handles detailed human features, natural lighting, and environmental context. Skin texture, hair detail, and subtle expressions all come through clearly.
Text rendering in images
Whether you need English or Chinese text, the model integrates typography seamlessly into the scene. Complex layouts with multiple text elements maintain readability and visual coherence.
Natural scenes
Fine details in landscapes, animal fur, and organic textures show notable improvement over the August release.
How to use it
Give the model a detailed text prompt describing what you want to see. The more specific you are about composition, lighting, style, and details, the better your results will be.
For best results with text in images, be explicit about what the text should say and where it should appear in the scene.
The model supports various aspect ratios and artistic styles—from photorealistic to impressionistic to anime aesthetics.
Technical details
Qwen-Image-2512 is built on a 20 billion parameter Multimodal Diffusion Transformer architecture. The base Qwen-Image model was released in August 2024, and this December update brings substantial improvements across all capabilities.
The model was developed by Alibaba Cloud’s Qwen team and is released under the Apache 2.0 license.
For more details, check out the model card on Hugging Face.
Try it yourself
You can experiment with Qwen-Image-2512 in the Replicate Playground at replicate.com/playground