Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
This model allows you to edit images using text by performing text-guided image to image translation. You can either provide your own image or use another text prompt to generate an initial image with Stable Diffusion and then translate it using the translation prompt.
- To translate your own image, set the
input_imageargument and leave
- To first generate an image from text, leave
input_imageempty and your text prompt at
generation_prompt. In this case, the generated input is returned as the first output.
1. Feature extraction
input_image: The input image is first inverted, outputting a noise-map that can be transformed into the original image using stable-diffusion. The intermediate stable-diffusion features for this generation ares saved
- From text: An image is generated by stable-diffusion by the text-prompt and the intermediate features are saved.
2. Image translation
A new translated image is generated using the translation text and the saved spacial features In the config parameters, you can control the following aspects in the translation:
- Structure preservation can be controlled by the
feature_injection_thresholdparameter (a higher value allows better structure preservation but can also leak details from the source image, ~80% of the total sampling steps generally gives a good tradeoff).
- Deviation from the guidance image can be controlled through the
negative_prompt_scheduleparameters (see the sample config files for details). The effect of negative prompting is minor in case of realistic guidance images, but it can significantly help in case of minimalistic and abstract guidance images (e.g. segmentations).
Note that you can run a batch of translations by providing multiple target prompts in the