daanelson / plug_and_play_image_translation

Edit an image using features from diffusion models

  • Public
  • 8.5K runs
  • A100 (80GB)
  • GitHub
  • Paper

Input

input_image
*file

Image to edit (instead of generation prompt

string
Shift + Return to add a new line

Instead of input_image, generate an image from a text prompt (Input image is ignored if this is supplied)

Default: ""

string
Shift + Return to add a new line

Text to Image prompts. A list of edit texts (separated by ';') an image will be output for each edit txt

Default: "A photo of a robot horse"

number

Unconditional guidance scale. Note that a higher value encourages deviation from the source image (10 is the default for tranlsation from image 7.5 for text

Default: 10

number
(minimum: 0, maximum: 1)

Control the level of structure preservation. What timestep to stop Injecting the saved features into the translation diffusion process. (0 is first and 1 is final timestep meaning more preservation)

Default: 0.8

string
Shift + Return to add a new line

Control the level of deviation from the source image

Default: ""

number
(minimum: 0, maximum: 1)

Strength of the effect of the negative prompt (lower is stronger)

Default: 1

Output

outputoutputoutput
Generated in

This example was created by a different version, daanelson/plug_and_play_image_translation:f41455dc.

Run time and cost

This model costs approximately $0.28 to run on Replicate, or 3 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 4 minutes.

Readme

Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation

[Project Page] arXiv

The model

This model allows you to edit images using text by performing text-guided image to image translation. You can either provide your own image or use another text prompt to generate an initial image with Stable Diffusion and then translate it using the translation prompt.

  • To translate your own image, set the input_image argument and leave generation_prompt empty.
  • To first generate an image from text, leave input_image empty and your text prompt at generation_prompt. In this case, the generated input is returned as the first output.

1. Feature extraction

  • From input_image: The input image is first inverted, outputting a noise-map that can be transformed into the original image using stable-diffusion. The intermediate stable-diffusion features for this generation ares saved
  • From text: An image is generated by stable-diffusion by the text-prompt and the intermediate features are saved.

2. Image translation

A new translated image is generated using the translation text and the saved spacial features In the config parameters, you can control the following aspects in the translation:

  • Structure preservation can be controlled by the feature_injection_threshold parameter (a higher value allows better structure preservation but can also leak details from the source image, ~80% of the total sampling steps generally gives a good tradeoff).
  • Deviation from the guidance image can be controlled through the scale, negative_prompt_alpha and negative_prompt_schedule parameters (see the sample config files for details). The effect of negative prompting is minor in case of realistic guidance images, but it can significantly help in case of minimalistic and abstract guidance images (e.g. segmentations).

Note that you can run a batch of translations by providing multiple target prompts in the prompts parameter.