Qwen-Image-Layered
Decompose images into editable layers
Qwen-Image-Layered takes a single image and breaks it down into multiple RGBA layers—each representing a different part of the scene. You can then edit, move, resize, or recolor each layer independently without affecting the others.
How it works
Most image editing tools struggle with consistency because everything in a regular image is merged into one flat canvas. When you try to edit something, you risk changing other parts of the image too.
Qwen-Image-Layered solves this by separating an image into distinct RGBA layers, similar to how professional design tools like Photoshop work. Each layer contains a specific element from the original image—a person, an object, text, or the background. Since each layer has transparency information (the alpha channel in RGBA), you can edit one layer while leaving everything else perfectly intact.
The model can decompose images into anywhere from 2 to 8 layers depending on your needs. You can even decompose a layer further if you need more granular control.
What you can do with it
Once you have your layers, you can:
- Recolor objects without touching anything else
- Replace elements like swapping one person for another
- Edit text while keeping the rest of the design unchanged
- Delete objects cleanly with no artifacts left behind
- Resize objects without distortion
- Move elements around the canvas freely
- Adjust individual layers using standard image editing tools
The layered structure means edits stay consistent because you’re physically working with isolated parts of the image.
Example use cases
Design iteration: Take a product photo and quickly try different background colors or object placements without recreating the entire composition.
Content creation: Decompose a complex scene into layers, then mix and match elements to create variations for social media or marketing materials.
Photo editing: Isolate people or objects in photos to adjust them individually—change clothing colors, swap backgrounds, or remove unwanted items without complex masking.
Template creation: Break down designs into reusable layers that can be customized for different contexts or clients.
How the model was built
The model uses three main components:
An RGBA-VAE creates a shared latent space for both regular RGB images and RGBA images with transparency. This lets the model work seamlessly with layered representations.
A variable layers decomposition architecture can generate different numbers of layers from the same image, making it flexible for different use cases.
The training process started with a pretrained image generation model and progressively adapted it to decompose images into layers. The team also built a pipeline to extract and annotate multilayer images from real Photoshop documents, which helped train the model on professional-quality layered data.
Tips for best results
- Start with 3-4 layers for most images—this usually captures the main elements without over-segmenting
- Use more layers (6-8) for complex scenes with many distinct objects
- If you need finer control over a specific layer, you can decompose that layer again
- The model works best with clear, well-defined objects and backgrounds
- For editing individual layers after decomposition, you can use image editing tools that support RGBA
Technical details
The model outputs standard RGBA images for each layer (RGB color channels plus an alpha transparency channel). These can be opened in any image editor that supports transparency, or programmatically manipulated using image processing libraries.
Qwen-Image-Layered was developed by the Qwen team and is licensed under Apache 2.0. The research paper provides more details about the architecture and training approach: https://arxiv.org/abs/2512.15603
You can try the model on the Replicate playground at replicate.com/playground