Explore

replicate/vicuna-13b
A large language model that's been fine-tuned on ChatGPT interactions

suno-ai/bark
🔊 Text-Prompted Generative Audio Model

meronym/speaker-transcription
Whisper transcription plus speaker diarization

stability-ai/stablelm-tuned-alpha-7b
7 billion parameter version of Stability AI's language model

ai-forever/kandinsky-2
text2img model trained on LAION HighRes and fine-tuned on internal datasets

stability-ai/stable-diffusion
A latent text-to-image diffusion model capable of generating photo-realistic images given any text input
Collections
Audio generation
Models to generate and modify audio
riffusion/riffusion, afiaka87/tortoise-tts, suno-ai/bark, allenhung1025/looptest, haoheliu/audio-ldm...
ControlNet
Control diffusion models
jagilley/controlnet-scribble, jagilley/controlnet-hough, jagilley/controlnet-canny, jagilley/controlnet-hed, jagilley/controlnet-depth2img...
Diffusion models
Image and video generation models trained with diffusion processes
stability-ai/stable-diffusion, cjwbw/anything-v3-better-vae, cjwbw/anything-v4.0, cjwbw/waifu-diffusion, tommoore515/material_stable_diffusion...
Embedding Models
Models that generate embeddings from inputs
andreasjansson/clip-features, replicate/all-mpnet-base-v2, daanelson/imagebind...
Image restoration
Models that improve or restore images by deblurring, colorization, and removing noise
tencentarc/gfpgan, jingyunliang/swinir, microsoft/bringing-old-photos-back-to-life, cjwbw/bigcolor, google-research/maxim...
Image to text
Models that generate text prompts and captions from images
salesforce/blip, andreasjansson/blip-2, methexis-inc/img2prompt, rmokady/clip_prefix_caption, pharmapsychotic/clip-interrogator...
Language models
Models that can understand and generate text
replicate/flan-t5-xl, stability-ai/stablelm-tuned-alpha-7b, replicate/vicuna-13b, replicate/llama-7b, replicate/dolly-v2-12b...
ML makeovers
Models that let you change facial features
orpatashnik/styleclip, yuval-alaluf/sam, rinongal/stylegan-nada, mchong6/jojogan, yuval-alaluf/restyle_encoder...
Style transfer
Models that take a content image and a style reference to produce a new image
paper11667/clipstyler, huage001/adaattn, ptran1203/pytorch-animegan, sanzgiri/cartoonify_video, ariel415el/gpdm...
Super resolution
Upscaling models that create high-quality images from low-quality images
nightmareai/real-esrgan, jingyunliang/swinir, mv-lab/swin2sr, cjwbw/rudalle-sr, cjwbw/real-esrgan...
Text to image
Models that generate images from text prompts
stability-ai/stable-diffusion, pixray/text2image, cjwbw/waifu-diffusion, kuprel/min-dalle, laion-ai/erlich...
Videos
Models that create and edit videos
deforum/deforum_stable_diffusion, andreasjansson/stable-diffusion-animation, nateraw/stable-diffusion-videos, nightmareai/cogvideo, arielreplicate/stable_diffusion_infinite_zoom...
Popular models
Practical face restoration algorithm for *old photos* or *AI-generated faces*
Fill in masked parts of images with Stable Diffusion
Generate detailed images from scribbled drawings
Real-ESRGAN with optional face correction and adjustable upscale
Robust face restoration algorithm for old photos / AI-generated faces
Latest models
This model can edit clothing found within an image, using a state of the art clothing segmentation algorithm.
An instruction-tuned LLM that allows you to constrain syllable patterns
Regression of musical arousal and valence values
Classification of music approachability and engagement
An EfficientNet for music style classification by 400 styles from the Discogs taxonomy
My own personal try of Stable Diffusion
Transcribes any audio file (base64, url) with speaker diarization. *Please read instructions below*
Transformers implementation of the LLaMA language model
A large language model that's been fine-tuned on ChatGPT interactions
A multi-input ControlNet model. Pass in control images and set the weights.
Generate subtitles (.srt and .vtt) from audio files using OpenAI's Whisper models.
controlnet 1.1 lineart x realistic-vision-v2.0
Tuning-Free Multi-Subject Image Generation with Localized Attention
An instruction-tuned multi-modal model based on BLIP-2 and Vicuna-13B
Image captioning via vision-language models with instruction tuning
Generate Pokémon from a text description
A model for text, audio, and image embeddings in one space
A model which generates text in response to an input image and prompt.
ControlNet annotators - the initial image that is fed into a stable diffusion pipeline with ControlNet
Generate a new image given any input text with RPG V4
Generate a new image from an input image with Edge Of Realism - EOR v2.0
Generate a new image from an input image with Deliberate v2
Generate a new image from an input image with Realistic Vision V2.0
Stylized Audio-Driven Single Image Talking Face Animation
An instruction-tuned multimodal large language model that generates text based on user-provided prompts and images
Generate a new image given any input text with URPM v1.3
Generate a new image given any input text with Deliberate v2
Generate a new image given any input text with Realistic Vision V2.0
Generate a new image given any input text with Edge Of Realism - EOR v2.0
Generate a new image given any input text with Babes 2.0
A 7B parameter LLM fine-tuned to support contexts with more than 65K tokens
7B parameter base version of Stability AI's language model
Consistent view characters with ControlNet and Stable Diffusion fine-tuned on Ready Player Me characters based on OpenJourneyV4
3B parameter base version of Stability AI's language model