Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
[Project Page]
The model
This model allows you to edit images using text by performing text-guided image to image translation. You can either provide your own image or use another text prompt to generate an initial image with Stable Diffusion and then translate it using the translation prompt.
- To translate your own image, set the
input_image
argument and leavegeneration_prompt
empty. - To first generate an image from text, leave
input_image
empty and your text prompt atgeneration_prompt
. In this case, the generated input is returned as the first output.
1. Feature extraction
- From
input_image
: The input image is first inverted, outputting a noise-map that can be transformed into the original image using stable-diffusion. The intermediate stable-diffusion features for this generation ares saved - From text: An image is generated by stable-diffusion by the text-prompt and the intermediate features are saved.
2. Image translation
A new translated image is generated using the translation text and the saved spacial features In the config parameters, you can control the following aspects in the translation:
- Structure preservation can be controlled by the
feature_injection_threshold
parameter (a higher value allows better structure preservation but can also leak details from the source image, ~80% of the total sampling steps generally gives a good tradeoff). - Deviation from the guidance image can be controlled through the
scale
,negative_prompt_alpha
andnegative_prompt_schedule
parameters (see the sample config files for details). The effect of negative prompting is minor in case of realistic guidance images, but it can significantly help in case of minimalistic and abstract guidance images (e.g. segmentations).
Note that you can run a batch of translations by providing multiple target prompts in the prompts
parameter.