Prompt-to-Prompt
Stable Diffusion Implementation
Code for the demo is here https://github.com/chenxwh/prompt-to-prompt
Tips for the demo input above:
Prompt-to-prompt enables editing a stable-diffusion generated image original_image
, generated with original_prompt
, by editing the original_prompt
only edited_prompt
.
If you do not already have an original_image - original_prompt
pair to play around with the editing, you can generate one by only giving value for original_prompt
, set to None
. It is best to set a seed (or remember the random seed assigned), which will be used for generating images with edited_prompt
.
Now with original_prompt
(and original_image
in mind), there are three options for editing the prompt. Refer to the instructions below for each type of the editing. If you choose Re-weight
, in the edited_prompt
field only provide the weights assigned for words from the original_prompt
, in the format of [list of words] | [list of weights]
. The example gallery may come helpful!
Additionally, there is the local_edit
option, in the format of words in original_prompt | words in edited_prompt
, which allows you to specify the only words (semantics) that will be edited.
Prompt Edits
In our notebooks, we perform our main logic by implementing the abstract class AttentionControl
object, of the following form:
class AttentionControl(abc.ABC):
@abc.abstractmethod
def forward (self, attn, is_cross: bool, place_in_unet: str):
raise NotImplementedError
The forward
method is called in each attention layer of the diffusion model during the image generation, and we use it to modify the weights of the attention. Our method (See Section 3 of our paper) edits images with the procedure above, and each different prompt edit type modifies the weights of the attention in a different manner.
Replacement
In this case, the user swaps tokens of the original prompt with others, e.g., the editing the prompt "A painting of a squirrel eating a burger"
to "A painting of a squirrel eating a lasagna"
or "A painting of a lion eating a burger"
. For this we define the class AttentionReplace
.
Refinement
In this case, the user adds new tokens to the prompt, e.g., editing the prompt "A painting of a squirrel eating a burger"
to "A watercolor painting of a squirrel eating a burger"
. For this we define the class AttentionEditRefine
.
Re-weight
In this case, the user changes the weight of certain tokens in the prompt, e.g., for the prompt "A photo of a poppy field at night"
, strengthen or weaken the extent to which the word night
affects the resulting image. For this we define the class AttentionReweight
.
Attention Control Options
cross_replace_steps
: specifies the fraction of steps to edit the cross attention maps. Can also be set to a dictionary[str:float]
which specifies fractions for different words in the prompt.self_replace_steps
: specifies the fraction of steps to replace the self attention maps.local_blend
(optional):LocalBlend
object which is used to make local edits.LocalBlend
is initialized with the words from each prompt that correspond with the region in the image we want to edit.equalizer
: used for attention Re-weighting only. A vector of coefficients to multiply each cross-attention weight
Citation
@article{hertz2022prompt,
title = {Prompt-to-Prompt Image Editing with Cross Attention Control},
author = {Hertz, Amir and Mokady, Ron and Tenenbaum, Jay and Aberman, Kfir and Pritch, Yael and Cohen-Or, Daniel},
journal = {arXiv preprint arXiv:2208.01626},
year = {2022},
}
Disclaimer
This is not an officially supported Google product.