adirik / syncdiffusion

Generate panoramic images with text prompts

  • Public
  • 117 runs
  • GitHub
  • Paper
  • License

Run time and cost

This model costs approximately $0.19 to run on Replicate, or 5 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 4 minutes.

Readme

SyncDiffusion

SyncDiffusion, leveraging the Stable Diffusion 2.0, introduces an innovative approach to generating seamless panoramas. Unlike conventional methods that often result in disjointed montages, SyncDiffusion utilizes a unique synchronization mechanism through gradient descent and perceptual similarity loss. This technique ensures the creation of coherent and visually consistent panoramic images, effectively addressing the common issue of blending different scenes inappropriately. SyncDiffusion stands out in its ability to produce high-quality outputs that are not only true to the input prompt but also maintain overall image fidelity.

For further information and technical details, you can refer to the original project page, paper and repository.

How to use the API

To use Sync Diffusion you need to provide text prompts. You can create horizontal or vertical panoramic immages. The output file will be in .png format. The API input arguments are as follows:

  • prompt: Provide a descriptive prompt for the image you want to generate. This is the primary driver of the content in the generated panorama.
  • negative_prompt: Use this to specify what you don’t want in the image, helping to refine the results.
  • width: Set the width of the output image.
  • height: Set the height of the output image.
  • guidance_scale: Adjusts the scale of the guidance image. Higher values lead to more adherence to the prompt.
  • sync_weight: Determines the weight of the sync diffusion in the image generation process.
  • sync_decay_rate: Sets the weight schduler decay rate of the sync diffusion.
  • sync_freq: Specifies the frequency for the gradient descent of the sync diffusion process.
  • sync_threshold: Defines the maximum number of steps for the sync diffusion.
  • num_inference_steps: Sets the number of inference steps for the diffusion process.
  • stride: Determines the window stride in the latent space for the diffusion.
  • seed: Provides a seed for the sync diffusion to control randomness.
  • loop_closure: Enable or disable the use of loop closure in the panorama image generation.

Important Notes

  • SyncDiffusion is applied in the early inference steps, as determined by “sync_threshold.” Consequently, the estimated duration for generation shown in the logs (at the beginning) is significantly longer than the actual duration. This discrepancy arises because SyncDiffusion steps are more computationally intensive compared to normal diffusion steps. More SyncDiffusion steps lead to more consistent images, but they also result in longer processing times.
  • Generating square images (e.g., 2048x2048) with SyncDiffusion takes substantially longer compared to creating vertical or horizontal panoramic images.

References

@article{lee2023syncdiffusion, title={SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions}, author={Yuseung Lee and Kunho Kim and Hyunjin Kim and Minhyuk Sung}, journal={arXiv preprint arXiv:2306.05178}, year={2023} }