

andreasjansson / clip-features
Return CLIP features for the clip-vit-large-patch14 model
150.9M runs


jaaari / kokoro-82m
Kokoro v1.0 - text-to-speech (82M params, based on StyleTTS2)
88M runs


vaibhavs10 / incredibly-fast-whisper
whisper-large-v3, incredibly fast, powered by Hugging Face Transformers! 🤗
31.2M runs


prunaai / p-image-edit
A sub 1 second 0.01$ multi-image editing model built for production use cases. For image generation, check out p-image here: https://replicate.com/prunaai/p-image
27M runs

Anthropic's most capable model with a step-change improvement in agentic coding, better vision, and stronger multi-step reasoning
2.6K runs

Google's fast, expressive text-to-speech model with 30 voices and 70+ language support
3.7K runs

Generate full-length songs or instrumentals from a text prompt, with optional auto-generated lyrics
1.1K runs
bytedance/seedance-2.0ByteDance's multimodal video generation model with native audio, multimodal reference inputs, and intelligent duration control.
64.2K runs

Google's cost-efficient video generation model with native audio, optimized for high-volume applications
11.1K runs

bytedance/seedream-5-liteSeedream 5.0 lite: image generation with built-in reasoning, example-based editing, and deep domain knowledge
1.3M runs

Google's most intelligent model, with improved reasoning and a new medium thinking level
395.5K runs
runwayml/gen-4.5State-of-the-art video motion quality, prompt adherence and visual fidelity
116.4K runs
Generate videos using xAI's Grok Imagine Video model
520.6K runs

openai/gpt-image-1.5OpenAI's latest image generation model with better instruction following and adherence to prompts
9.4M runs

The highest fidelity image model from Black Forest Labs
1.7M runs

Google's fast image generation model with conversational editing, multi-image fusion, and character consistency
6.5M runs
Official models are always on, maintained, and have predictable pricing.

High-accuracy lip-sync: replace or dub audio on any video with avatar-inference lip sync

Fast lip-sync: replace or dub audio on any video with quick audio-driven lip sync

Anthropic's most capable model with a step-change improvement in agentic coding, better vision, and stronger multi-step reasoning

Google's fast, expressive text-to-speech model with 30 voices and 70+ language support

Reimagine any song in a different style — change voice, instruments, genre, and arrangement while keeping the original melody

Generate full-length songs or instrumentals from a text prompt, with optional auto-generated lyrics
Edit and transform videos with text prompts and reference images. Style transfers, object replacement, character transformation, and more.

Take a flat graphic, remove text, and get structured text layers back for editing and recomposing
A faster variant of Seedance 2.0 for quicker video generation with multimodal inputs and native audio.
ByteDance's multimodal video generation model with native audio, multimodal reference inputs, and intelligent duration control.
Generate full-length songs up to 3 minutes from text prompts or images with Lyria 3 Pro, Google's most capable music generation model

Generate 30-second music clips from text prompts or images with Lyria 3, Google's music generation model

Generate and edit high-quality images with Alibaba's Wan 2.7 Pro with 4K output, thinking mode, text-to-image, multi-image editing, and image set generation

Generate and edit images with Alibaba's Wan 2.7

Google's cost-efficient video generation model with native audio, optimized for high-volume applications
Edit videos with natural language instructions using Alibaba's Wan 2.7 VideoEdit model

Generate videos from reference images or clips while preserving subject identity using Alibaba's Wan 2.7 reference-to-video model

Generate videos from images, with support for first-and-last-frame control, clip continuation, and audio synchronization using Alibaba's Wan 2.7 model
Generate videos with audio from text prompts using Alibaba's Wan 2.7 model. 1080p, up to 15 seconds, with audio synchronization.
Generate videos guided by reference images using xAI's Grok Imagine Video model
Use AI to generate images & photos with an API
Use AI to understand, describe, and caption videos with an API
Use AI for text-to-speech or to clone your voice via API
Use AI to generate images from a face with an API
Use AI to generate videos with an API
Use AI to upscale and enhance images with an API
Use AI to generate music with an API
Use AI to edit any image via API
Use AI to transcribe speech to text with an API
Use AI For Optical Character Recognition (OCR) to extract text from images via API
Use AI to remove backgrounds from images and videos with an API
FLUX AI models by Black Forest Labs: image generation & editing via API
Use AI to restore images via API
Use AI to upscale, restore, extend, and enhance videos with an API
Detect NSFW content in images and text
Classify text by sentiment, topic, intent, or safety
Identify speakers from audio and video inputs
Replace faces across images with natural-looking results.
Transform rough sketches into polished visuals
Generate custom emojis from text or images
Create anime-style characters, scenes, and animations
Chat with images — visual Q&A, analysis, and reasoning via API
Use AI to generate captions and descriptions from images with an API
Use AI to edit, restyle, extend, and remix videos with an API
Use AI to generate videos from images with an API
WAN family of models: open-source video, image, and audio generation
Generate 3D objects, meshes, and textures from text or images with an API
Official models are always on, predictably priced, and have a stable API.
Explore Large Language Models (LLMs) for chat, generation & NLP tasks via API
Try AI Models for free: video generation, image generation, upscaling, and photo restoration
Use AI to generate lipsync videos with an API
Use AI to control image generation with an API
Embedding models for AI search and analysis
Use AI object detection and segmentation models to distinguish objects in images & videos
Flux fine-tunes: build and run custom AI image models via API
Kontext fine-tunes: Build custom AI image models with an API
Create songs with voice cloning models via API
AI media utilities: auto-caption, watermark, frame extraction & more via API
Browse the diverse range of qwen-image fine-tunes the community has custom-trained on Replicate.


diannaadel-droid / ddcontractors
14 runs


diannaadel-droid / poolplastersocial
24 runs

heygen / lipsync-precision
High-accuracy lip-sync: replace or dub audio on any video with avatar-inference lip sync
36 runs

heygen / lipsync-speed
Fast lip-sync: replace or dub audio on any video with quick audio-driven lip sync
27 runs


ben-jackson1 / pickscore-v1
Rank images against a prompt using PickScore v1 and return scores, probabilities, and the best match.
6 runs

anthropic / claude-opus-4.7
Anthropic's most capable model with a step-change improvement in agentic coding, better vision, and stronger multi-step reasoning
2.6K runs

google / gemini-3.1-flash-tts
Google's fast, expressive text-to-speech model with 30 voices and 70+ language support
3.7K runs


t-irwin-neiu / irwin-image-lora
30 runs


prunaai / ernie-image
ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Image team at Baidu
258 runs


prunaai / ernie-image-turbo
ERNIE-Image is an open text-to-image generation model developed by the ERNIE-Image team at Baidu
545 runs


cynthiachehayeb / pixy-yolo
Object Recognition model
41 runs


evancnavarro / enterprise-glass-v1
A visual style focused on modern enterprise architecture—reflective blue glass skyscrapers captured from upward perspectives, emphasizing symmetry, scale, and a clean, premium corporate aesthetic.
17 runs