omerbt / multidiffusion

Fusing Diffusion Paths for Controlled Image Generation

  • Public
  • 2.2K runs
  • A100 (80GB)
  • GitHub
  • Paper

Input

string
Shift + Return to add a new line

Input prompt

Default: "a photo of the dolomites"

string
Shift + Return to add a new line

The prompt or prompts not to guide the image generation (what you do not want to see in the generation). Ignored when not using guidance.

integer

Width of output image. Lower the setting if out of memory.

Default: 4096

integer

Height of output image. Lower the setting if out of memory.

Default: 512

integer

Number of images to output

Default: 1

integer
(minimum: 1, maximum: 500)

Number of denoising steps

Default: 50

number
(minimum: 1, maximum: 20)

Scale for classifier-free guidance

Default: 7.5

string

Choose a scheduler.

Default: "DDIM"

integer

Random seed. Leave blank to randomize the seed

Output

output
Generated in

This example was created by a different version, omerbt/multidiffusion:963ffeaa.

Run time and cost

This model costs approximately $0.068 to run on Replicate, or 14 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 49 seconds. The predict time for this model varies significantly based on the inputs.

Readme

MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

Cog Implementation: https://github.com/chenxwh/cog-themed-diffusion/tree/MultiDiffusion

This page demonstrates MultiDiffusion Text2Panorama using Stable Diffusion model

teaser

MultiDiffusion is a unified framework that enables versatile and controllable image generation, using a pre-trained text-to-image diffusion model, without any further training or finetuning, as described in (link to paper).

Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge, currently mostly addressed by costly and long re-training and fine-tuning or ad-hoc adaptations to specific image generation tasks. In this work, we present MultiDiffusion, a unified framework that enables versatile and controllable image generation, using a pre-trained text-to-image diffusion model, without any further training or finetuning. At the center of our approach is a new generation process, based on an optimization task that binds together multiple diffusion generation processes with a shared set of parameters or constraints. We show that MultiDiffusion can be readily applied to generate high quality and diverse images that adhere to user-provided controls, such as desired aspect ratio (e.g., panorama), and spatial guiding signals, ranging from tight segmentation masks to bounding boxes.

Citation

@article{bar2023multidiffusion,
  title={MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation},
  author={Bar-Tal, Omer and Yariv, Lior and Lipman, Yaron and Dekel, Tali},
  journal={arXiv preprint arXiv:2302.08113},
  year={2023}
}