Generate 768px images from text using CompVis `retrieval-augmented-diffusion`
Generate image from text by guiding a denoising diffusion model. Inference is somewhat slow.
CompVis `latent-diffusion text2im` finetuned for inpainting.
GLIDE-text2im w/ humans and experimental style prompts.
Guide a StyleGAN3 trained on pictures of mannequins with CLIP.
The predecessor to DALLE-2, GLIDE (filtered) with faster PRK/PLMS sampling.
Use stable diffusion and aesthetic CLIP embeddings to guide boring outputs to be more aesthetically pleasing.
Generate speech from text, clone voices from mp3 files. From James Betker AKA "neonbjb".
This model is cold. You'll get a fast response if the model is warm and already running, and a slower response if the model is cold and starting up.