adirik / prompt-to-prompt-realvisxl-3.0

Image editing with Prompt-to-Prompt for RealVisXL-v3.0

  • Public
  • 360 runs
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A40 (Large) GPU hardware.

Readme

RealVisXL V3.0 Prompt-to-Prompt

An implementation of Prompt-to-Prompt for RealVisXL V3.0, which is trained to generate photorealistic images. See the model page of RealVisXL V3.0 for details. The original implementation of Prompt-to-Prompt with SDXL can be found here.

Prompt-to-Prompt is an image editing framework that leverages the self-attention and cross-attention mechanisms of the diffusion process without requiring external tools for edits. It concurrently generates an original image and a modified version based on prompt changes, such as transitioning from “a pink bear” to “a pink dragon”. During diffusion, the technique blends the attentions from “bear” to “dragon”, maintaining the original image’s style while substituting “bear” with “dragon”.

There are 3 types of editing:

  • Replacement: In this case, the user swaps tokens of the original prompt with others, e.g., the editing the prompt “A painting of a squirrel eating a burger” to “A painting of a squirrel eating a lasagna” or “A painting of a lion eating a burger”.

  • Refinement: In this case, the user adds new tokens to the prompt, e.g., editing the prompt “A painting of a squirrel eating a burger” to “A watercolor painting of a squirrel eating a burger”.

  • Re-weight: In this case, the user changes the weight of certain tokens in the prompt, e.g., for the prompt “A photo of a poppy field at night”, strengthen or weaken the extent to which the word “night” affects the resulting image.

See the original paper, project page and repository for more details.

How to use the API

To edit images with RealVisXL V3.0 Prompt-to-Prompt, it is required to provide several input parameters that define the editing instructions. Parameters, “original_prompt” and “prompt_edit_type” are required. Unless the “prompt_edit_type” is “Re-weight”, “edited_prompt” parameter is required as well. The API input arguments are as follows:

  • original_prompt: The prompt used to generate an image with RealVisXL V3.0. This is the starting point for any image editing operation.

  • prompt_edit_type: Specifies the type of prompt editing to be applied. Options include Replacement, Refinement, or Re-weight. This choice determines how the edited prompt influences the original RealVisXL V3.0 output.

  • edited_prompt: The prompt used for editing the original RealVisXL V3.0 output image. This parameter is relevant for Replacement and Refinement edit types. For Re-weight, this can be left empty.

  • local_edit: Indicates specific areas to be edited, represented by comma-separated words. If left as None, the entire image is subject to change.

  • cross_replace_steps: The number of diffusion steps during which cross attention should be replaced. This is a fractional value between 0 and 1.0.

  • self_replace_steps: The number of diffusion steps during which self attention should be replaced. Like cross_replace_steps, this is a fractional value between 0 and 1.0.

  • equalizer_words: Words to be re-weighted (enhanced or diminished) during the editing process. Provide these words in a comma-separated list. If using re-weight, it is required. If not using reweight, this should be left empty.

  • equalizer_strengths: Strengths associated with the words to be re-weighted. These can be positive (for enhancement) or negative (for diminishment). Values should be provided in a comma-separated list corresponding to the equalizer_words. If using re-weight, it is required. If not using reweight, this should be left empty.

  • scheduler: The scheduler parameter determines the algorithm used for image generation of RealVisXL V3.0. Different schedulers can affect the quality and characteristics of the output.

  • num_inference_steps: This parameter defines the number of denoising steps in the image generation process of RealVisXL V3.0.

  • guidance_scale: The guidance scale parameter adjusts the influence of the classifier-free guidance in the generation process of RealVisXL V3.0. Higher values will make the model focus more on the prompt.

  • seed: A random seed for generating the original output. Leaving this blank randomizes the seed.

Model Details

Original Model: https://civitai.com/models/139562?modelVersionId=268861

Some important usage tips from the original model page:

  • Best performance comes with the scheduler “DPM++ SDE Karras” which is the default value in the API.

  • Classifier Free Guidance or Guidance Scale should be in between 1.5-3.

Citation

@article{hertz2022prompt, title={Prompt-to-prompt image editing with cross attention control}, author={Hertz, Amir and Mokady, Ron and Tenenbaum, Jay and Aberman, Kfir and Pritch, Yael and Cohen-Or, Daniel}, booktitle={arXiv preprint arXiv:2208.01626}, year={2022} }