adirik / local-prompt-mixing

Generating object-level shape variations with Stable Diffusion

  • Public
  • 80 runs
  • GitHub
  • Paper

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Local Prompt Mixing

Local prompt mixing is an image-to-image model which uses Stable Diffusion 1.4. It enables generating variations of an object in an image while preserving other elements in the image. See the original repository, project page or paper for details.

How to use the API

To use Local Prompt Mixing, simply upload an image (.jpg or .png) where you want to modify an object of interest. Provide a simple description of the image (prompt), the name of the object which will be modified. The outputs (variations on the object and the grid image which contains all of them) will be in .jpg format. The API input arguments are as follows:

  • prompt: a simple description of the image
  • object_of_interest: the object that you want to generate variations of it
  • proxy_words: the object(s) that you want to generate instead of object_of_interest
  • objects_to_preserve: the object(s) that you want to preserve in the image
  • number_of_variations: the number of auto-generated objects if you didn’t provide any proxy_words
  • steps: the number of denoising steps for Stable Diffusion
  • start_prompt_range: nth step where the prompt mixing begins
  • end_prompt_range: nth step where the prompt mixing ends
  • guidance_scale: the guidance scale of Stable Diffusion
  • seed: seed for reproducibility, default value is 10. Set to an arbitrary value for deterministic generation.

Important Notes

  • Your prompt must contain the word for object_of_interest (i.e. prompt: “a table below a lamp”, object_of_interest: “lamp”), otherwise API will not work properly.
  • This API has 2 major options for proxy words. If you provide your own words, Stable Diffusion will try to generate variations of the object of interest according to them. If they are sementically closer (i.e. lamp -> light), the performance will be better. If you don’t provide proxy words, API will select words sementically closer to your object of interest. The number of auto generated words are determined by number_of_variations parameter. Thus, please choose one of the approaches.
  • The parameters, “start_prompt_range” and “end_prompt_range” must be smaller than the parameter, “steps”. Also, “start_prompt_range” must be smaller than “end_prompt_range”.

References

@InProceedings{patashnik2023localizing, author = {Patashnik, Or and Garibi, Daniel and Azuri, Idan and Averbuch-Elor, Hadar and Cohen-Or, Daniel}, title = {Localizing Object-level Shape Variations with Text-to-Image Diffusion Models}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, year = {2023} }