adirik / local-prompt-mixing

Generating object-level shape variations with Stable Diffusion (Updated 1 year, 6 months ago)

  • Public
  • 86 runs
  • GitHub
  • Paper
Iterate in playground

Input

image
*file

Input image

*string
Shift + Return to add a new line

A simple description of the image

*string
Shift + Return to add a new line

Object of interest which is desired to be modified

string
Shift + Return to add a new line

Your own proxy words, if you leave empty then auto-generated proxy words will be used

Default: ""

string
Shift + Return to add a new line

Objects to preserve, please use comma separated values. If you leave empty then no objects will be preserved

Default: ""

integer
(minimum: 1, maximum: 5)

Number of auto variations to generate

Default: 1

integer
(minimum: 1, maximum: 100)

Number of denoising steps

Default: 50

integer

Nth step where prompt-mixing begins

Default: 7

integer

Nth step where prompt-mixing ends

Default: 17

number
(minimum: 0, maximum: 20)

Guidance scale

Default: 7.5

integer
(minimum: 0, maximum: 100)

Random seed

Default: 10

Output

outputoutputoutput
Generated in

This output was created using a different version of the model, adirik/local-prompt-mixing:66a8d7d7.

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Local Prompt Mixing

Local prompt mixing is an image-to-image model which uses Stable Diffusion 1.4. It enables generating variations of an object in an image while preserving other elements in the image. See the original repository, project page or paper for details.

How to use the API

To use Local Prompt Mixing, simply upload an image (.jpg or .png) where you want to modify an object of interest. Provide a simple description of the image (prompt), the name of the object which will be modified. The outputs (variations on the object and the grid image which contains all of them) will be in .jpg format. The API input arguments are as follows:

  • prompt: a simple description of the image
  • object_of_interest: the object that you want to generate variations of it
  • proxy_words: the object(s) that you want to generate instead of object_of_interest
  • objects_to_preserve: the object(s) that you want to preserve in the image
  • number_of_variations: the number of auto-generated objects if you didn’t provide any proxy_words
  • steps: the number of denoising steps for Stable Diffusion
  • start_prompt_range: nth step where the prompt mixing begins
  • end_prompt_range: nth step where the prompt mixing ends
  • guidance_scale: the guidance scale of Stable Diffusion
  • seed: seed for reproducibility, default value is 10. Set to an arbitrary value for deterministic generation.

Important Notes

  • Your prompt must contain the word for object_of_interest (i.e. prompt: “a table below a lamp”, object_of_interest: “lamp”), otherwise API will not work properly.
  • This API has 2 major options for proxy words. If you provide your own words, Stable Diffusion will try to generate variations of the object of interest according to them. If they are sementically closer (i.e. lamp -> light), the performance will be better. If you don’t provide proxy words, API will select words sementically closer to your object of interest. The number of auto generated words are determined by number_of_variations parameter. Thus, please choose one of the approaches.
  • The parameters, “start_prompt_range” and “end_prompt_range” must be smaller than the parameter, “steps”. Also, “start_prompt_range” must be smaller than “end_prompt_range”.

References

@InProceedings{patashnik2023localizing, author = {Patashnik, Or and Garibi, Daniel and Azuri, Idan and Averbuch-Elor, Hadar and Cohen-Or, Daniel}, title = {Localizing Object-level Shape Variations with Text-to-Image Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, year = {2023} }