Explore
Featured models

anthropic / claude-3.5-sonnet
Anthropic's most intelligent language model to date, with a 200K token context window and image understanding (claude-3-5-sonnet-20241022)

minimax / video-01-director
Generate videos with specific camera movements

google / imagen-3-fast
A faster and cheaper Imagen 3 model, for when price or speed are more important than final image quality

google / imagen-3
Google's highest quality text-to-image model, capable of generating images with detail, rich lighting and beauty

deepseek-ai / deepseek-r1
A reasoning model trained with reinforcement learning, on par with OpenAI o1

tencent / hunyuan-video
A state-of-the-art text-to-video generation model capable of creating high-quality videos with realistic motion from text descriptions

playht / play-dialog
End-to-end AI speech model designed for natural-sounding conversational speech synthesis, with support for context-aware prosody, intonation, and emotional expression.

zsxkib / mmaudio
Add sound to video. An advanced AI model that synthesizes high-quality audio from video content, enabling seamless video-to-audio transformation

recraft-ai / recraft-v3
Recraft V3 (code-named red_panda) is a text-to-image model with the ability to generate long texts, and images in a wide list of styles. As of today, it is SOTA in image generation, proven by the Text-to-Image Benchmark by Artificial Analysis
I want to…
Generate images
Models that generate images from text prompts
Generate videos
Models that create and edit videos
Caption images
Models that generate text from images
Transcribe speech
Models that convert speech to text
Generate text
Models that can understand and generate text
Upscale images
Upscaling models that create high-quality images from low-quality images
Use official models
Official models are always on, maintained, and have predictable pricing.
Restore images
Models that improve or restore images by deblurring, colorization, and removing noise
Enhance videos
Models that enhance videos with super-resolution, sound effects, motion capture and other useful production effects.
Generate speech
Convert text to speech
Caption videos
Model s that generate text from videos
Remove backgrounds
Models that remove backgrounds from images and videos
Use handy tools
Toolbelt-type models for videos and images.
Detect objects
Models that detect or segment objects in images and videos.
Generate music
Models to generate and modify music
Sing with voices
Voice-to-voice cloning and musical prosody
Make 3D stuff
Models that generate 3D objects, scenes, radiance fields, textures and multi-views.
Chat with images
Ask language models about images
Use a face to make images
Make realistic images of people instantly
Extract text from images
Optical character recognition (OCR) and text extraction
Get embeddings
Models that generate embeddings from inputs
Use the FLUX family of models
The FLUX family of text-to-image models from Black Forest Labs
Use FLUX fine-tunes
Browse the diverse range of fine-tunes the community has custom-trained on Replicate
Control image generation
Guide image generation with more than just text. Use edge detection, depth maps, and sketches to get the results you want.
Edit images
Tools for manipulating images.
Popular models
SDXL-Lightning by ByteDance: a fast text-to-image model that makes high-quality images in 4 steps
Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification
Return CLIP features for the clip-vit-large-patch14 model
A simple OCR Model that can easily extract text from an image.
Real-ESRGAN with optional face correction and adjustable upscale
A text-to-image generative AI model that creates beautiful images
Latest models
BLIP3(XGen-MM) is a series of foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research
Transcribe audios using OpenAI's Whisper with stabilizing timestamps by stable-ts python package.
Use a face to instantly make images. Uses SDXL Lightning checkpoints.
Cog to turn minimally-formatted plaintext into pdfs (using tex on the backend)
Dark Sushi Mix 2.25D Model with vae-ft-mse-840000-ema (Text2Img, Img2Img and Inpainting)
DeepSeek LLM, an advanced language model comprising 67 billion parameters. Trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese
A llama-3 based moderation and safeguarding language model
InstantID. ControlNets. More base SDXL models. And the latest ByteDance's ⚡️SDXL-Lightning !⚡️
The img2img pipeline that makes an anime-style image of a person. It uses one of sd1.5 models as a base, depth-estimation as a ControleNet and IPadapter model for face consistency.
Consistent Self-Attention for Long-Range Image and Video Generation
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Robust face restoration algorithm for old photos / AI-generated faces (adapted to work with video inputs)
Just some good ole beautifulsoup scrapping URL magic. (some sites don't work as they block scrapping, but still useful)
PyTorch implementation of AnimeGAN for fast photo animation
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis
AbsoluteReality V1.8.1 Model (Text2Img, Img2Img and Inpainting)
Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets
Newest reranker model from BAAI (https://huggingface.co/BAAI/bge-reranker-v2-m3). FP16 inference enabled. Normalize param available
Generate a video that morphs between subjects, with an optional style
An efficient, intelligent, and truly open-source language model
Make stickers with AI. Generates graphics with transparent backgrounds.
yuan2.0-2b-mars是源2.0-2B模型的2024年3月版本,源2.0 是浪潮信息发布的新一代基础语言大模型。我们开源了全部的3个模型源2.0-102B,源2.0-51B和源2.0-2B。并且我们提供了预训练,微调,推理服务的相关脚本,以供研发人员做进一步的开发。源2.0是在源1.0的基础上,利用更多样的高质量预训练数据和指令微调数据集,令模型在语义、数学、推理、代码、知识等不同方面具备更强的理解能力。
Idefics2 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
FlashFace: Human Image Personalization with High-fidelity Identity Preservation