cjwbw / lambda-eclipse

λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

  • Public
  • 174 runs
  • L40S
  • Paper
  • License

Input

string
Shift + Return to add a new line

The prompt to guide the image generation.

Default: "a cat wearing glasses at the beach"

string
Shift + Return to add a new line

The prompt or prompts not to guide the image generation.

Default: "over-exposure, under-exposure, saturated, duplicate, out of frame, lowres, cropped, worst quality, low quality, jpeg artifacts, morbid, mutilated, ugly, bad anatomy, bad proportions, deformed, blurry"

*file
Preview
image1

The image for the first subject.

file
Preview
image2

Optional. The image for the first subject.

string
Shift + Return to add a new line

The subject category of the first image.

Default: "dog"

string
Shift + Return to add a new line

Optional. The subject category of the first image.

Default: ""

integer
(minimum: 1, maximum: 500)

The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

Default: 25

number
(minimum: 1, maximum: 20)

Scale for classifier-free guidance.

Default: 7.5

integer

Random seed. Leave blank to randomize the seed

Output

output
Generated in

Run time and cost

This model costs approximately $0.060 to run on Replicate, or 16 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 62 seconds. The predict time for this model varies significantly based on the inputs.

Readme

λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

This repository contains the inference code for our paper, λ-ECLIPSE.

  • The λ-ECLIPSE model is a light weight support for multi-concept personalization. λ-ECLIPSE is tiny T2I prior model designed for Kandinsky v2.2 diffusion image generator.

  • λ-ECLIPSE model extends the ECLIPSE-Prior via incorporating the image-text interleaved data.

  • λ-ECLIPSE shows that we do not need to train the Personalized T2I (P-T2I) models on lot of resources. For instance, λ-ECLIPSE is trained on mere 74 GPU Hours (A100) compared to it’s couterparts BLIP-Diffusion (2304 GPU hours) and Kosmos-G (12300 GPU hours).

Qualitative Examples: Examples

Quantitative Comparisons: Results

Acknowledgement

We would like to acknoweldge excellent open-source text-to-image models (Kalro and Kandinsky) without them this work would not have been possible. Also, we thank HuggingFace for streamlining the T2I models.