cjwbw / ledits

Real Image Editing with DDPM Inversion and Semantic Guidance

  • Public
  • 931 runs
  • GitHub
  • Paper

Run time and cost

This model runs on Nvidia T4 GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

project page: https://editing-images-project.hf.space/index.html

DDPM Inversion X SEGA

The exceptional realism and diversity of text-guided diffusion models in image synthesis have sparked significant interest, leading to ongoing research on utilizing these models for image editing. Recently, intuitive text-based editing showcased the ability to effortlessly manipulate synthesized images using text alone. In a recent work by Brack et al. the concept of semantic guidance (SEGA) was introduced for diffusion models, demonstrating sophisticated image composition and editing capabilities without the need for additional training or external guidance. Text-guided editing of a real image with state-of-the-art tools requires inverting the given image and textual prompt. That is, finding a sequence of noise vectors that produces the input image when fed with the prompt into the diffusion process. A novel inversion method for DDPM was proposed by Huberman-Spiegelglas et al., which computes noise maps that exhibit stronger image structure encoding and generates diverse state-of-the-art results for text-based editing tasks. In this work we demonstrate the extended editing capabilities obtained from combining the two techniques. examples

How Does it Work?

Our approach for the integration consists of a simple modification to the SEGA scheme of the diffusion denoising process. This modification allows the flexibility of editing with both methods while still maintaining complete control over the editing effect of each component. First, we apply DDPM inversion on the input image to estimate the latent code associated with it. To apply the editing operations, we perform the denoising loop such that for each timestep, we repeat the logic used in SEGA but with the DDPM scheme, using the pre-computed noise vectors. demo

BibTeX

@article{tsaban2023ledits,
  author    = {Linoy Tsaban and Apolinário Passos},
  title     = {LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance},
  year      = {2023},
  eprint    = {2307.00522},
  archivePrefix = {arXiv},
  primaryClass={cs.CV}
}