jack000 / glid-3-xl

A 1.4B parameter text2im model from CompVis, finetuned on CLIP text embeds and curated data.

  • Public
  • 45.5K runs
  • T4
  • GitHub
  • Paper
  • License

Input

string
Shift + Return to add a new line

Your text prompt.

Default: ""

string
Shift + Return to add a new line

(optional) Negate the model's prediction for this text from the model's prediction for the target text.

Default: ""

file

(optional) Initial image to use for the model's prediction.

number
(minimum: 0, maximum: 1)

Fraction of sampling steps to skip when using an init image.

Default: 0

integer

Batch size.

Default: 4

integer

Target width

Default: 256

integer

Target height

Default: 256

integer
(minimum: -1, maximum: 4294967295)

Seed for random number generator.

Default: -1

number
(minimum: -20, maximum: 100)

Classifier-free guidance scale. Higher values will result in more guidance toward caption, with diminishing returns. Try values between 1.0 and 40.0.

Default: 5

integer
(minimum: 15, maximum: 250)

Number of diffusion steps to run.

Default: 50

Output

outputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutputoutput
Generated in

This example was created by a different version, jack000/glid-3-xl:d17813f3.

Run time and cost

This model costs approximately $0.067 to run on Replicate, or 14 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 5 minutes. The predict time for this model varies significantly based on the inputs.

Readme

GLID-3-XL

GLID-3-xl is the 1.4B latent diffusion model from CompVis back-ported to the guided diffusion codebase

The model has been split into three checkpoints. This lets us fine tune the diffusion model on new datasets and for additional tasks like inpainting and super-resolution

Download model files

# text encoder (required)
wget https://dall-3.com/models/glid-3-xl/bert.pt

# ldm first stage (required)
wget https://dall-3.com/models/glid-3-xl/kl-f8.pt

# there are several diffusion models to choose from:

# original diffusion model from CompVis
wget https://dall-3.com/models/glid-3-xl/diffusion.pt

# new model fine tuned on a cleaner dataset (will not generate watermarks, split images or blurry images)
wget https://dall-3.com/models/glid-3-xl/finetune.pt

# inpaint
wget https://dall-3.com/models/glid-3-xl/inpaint.pt