Predictions run on Nvidia T4 GPU hardware. Predictions typically complete within 13 minutes. The predict time for this model varies significantly based on the inputs.
GLID-3-xl is the 1.4B latent diffusion model from CompVis back-ported to the guided diffusion codebase
The model has been split into three checkpoints. This lets us fine tune the diffusion model on new datasets and for additional tasks like inpainting and super-resolution
# text encoder (required) wget https://dall-3.com/models/glid-3-xl/bert.pt # ldm first stage (required) wget https://dall-3.com/models/glid-3-xl/kl-f8.pt # there are several diffusion models to choose from: # original diffusion model from CompVis wget https://dall-3.com/models/glid-3-xl/diffusion.pt # new model fine tuned on a cleaner dataset (will not generate watermarks, split images or blurry images) wget https://dall-3.com/models/glid-3-xl/finetune.pt # inpaint wget https://dall-3.com/models/glid-3-xl/inpaint.pt