Qwen-Image-Layered

Decompose images into editable layers

Qwen-Image-Layered takes a single image and breaks it down into multiple RGBA layers—each representing a different part of the scene. You can then edit, move, resize, or recolor each layer independently without affecting the others.

How it works

Most image editing tools struggle with consistency because everything in a regular image is merged into one flat canvas. When you try to edit something, you risk changing other parts of the image too.

Qwen-Image-Layered solves this by separating an image into distinct RGBA layers, similar to how professional design tools like Photoshop work. Each layer contains a specific element from the original image—a person, an object, text, or the background. Since each layer has transparency information (the alpha channel in RGBA), you can edit one layer while leaving everything else perfectly intact.

The model can decompose images into anywhere from 2 to 8 layers depending on your needs. You can even decompose a layer further if you need more granular control.

What you can do with it

Once you have your layers, you can:

Recolor objects without touching anything else
Replace elements like swapping one person for another
Edit text while keeping the rest of the design unchanged
Delete objects cleanly with no artifacts left behind
Resize objects without distortion
Move elements around the canvas freely
Adjust individual layers using standard image editing tools

The layered structure means edits stay consistent because you’re physically working with isolated parts of the image.

Example use cases

Design iteration: Take a product photo and quickly try different background colors or object placements without recreating the entire composition.

Content creation: Decompose a complex scene into layers, then mix and match elements to create variations for social media or marketing materials.

Photo editing: Isolate people or objects in photos to adjust them individually—change clothing colors, swap backgrounds, or remove unwanted items without complex masking.

Template creation: Break down designs into reusable layers that can be customized for different contexts or clients.

How the model was built

The model uses three main components:

An RGBA-VAE creates a shared latent space for both regular RGB images and RGBA images with transparency. This lets the model work seamlessly with layered representations.

A variable layers decomposition architecture can generate different numbers of layers from the same image, making it flexible for different use cases.

The training process started with a pretrained image generation model and progressively adapted it to decompose images into layers. The team also built a pipeline to extract and annotate multilayer images from real Photoshop documents, which helped train the model on professional-quality layered data.

Tips for best results

Start with 3-4 layers for most images—this usually captures the main elements without over-segmenting
Use more layers (6-8) for complex scenes with many distinct objects
If you need finer control over a specific layer, you can decompose that layer again
The model works best with clear, well-defined objects and backgrounds
For editing individual layers after decomposition, you can use image editing tools that support RGBA

Technical details

The model outputs standard RGBA images for each layer (RGB color channels plus an alpha transparency channel). These can be opened in any image editor that supports transparency, or programmatically manipulated using image processing libraries.

Qwen-Image-Layered was developed by the Qwen team and is licensed under Apache 2.0. The research paper provides more details about the architecture and training approach: https://arxiv.org/abs/2512.15603

You can try the model on the Replicate playground at replicate.com/playground

Model created 2 months, 2 weeks ago