

andreasjansson / clip-features
Return CLIP features for the clip-vit-large-patch14 model
153.5M runs


jaaari / kokoro-82m
Kokoro v1.0 - text-to-speech (82M params, based on StyleTTS2)
90.3M runs


prunaai / p-image-edit
A sub 1 second 0.01$ multi-image editing model built for production use cases. For image generation, check out p-image here: https://replicate.com/prunaai/p-image
29M runs


prunaai / z-image-turbo
Z-Image Turbo is a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.
40.6M runs
Alibaba's Happy Horse 1.0 generates videos from text prompts or animates a single image into video. Supports 720p and 1080p, 3-15 second durations, and five aspect ratios.
3.5K runs

openai/gpt-image-2OpenAI's state-of-the-art image generation model. Create and edit images from text with strong instruction following, sharp text rendering, and detailed editing.
848.6K runs

Anthropic's most capable model with a step-change improvement in agentic coding, better vision, and stronger multi-step reasoning
13.1K runs

Google's fast, expressive text-to-speech model with 30 voices and 70+ language support
35K runs

Generate full-length songs or instrumentals from a text prompt, with optional auto-generated lyrics
3.3K runs
bytedance/seedance-2.0ByteDance's multimodal video generation model with native audio, multimodal reference inputs, and intelligent duration control.
152.1K runs

Google's cost-efficient video generation model with native audio, optimized for high-volume applications
19.7K runs
prunaai/p-video-avatarp-video-avatar is the fastest and cheapest avatar/lipsync video model on the market.
19.5K runs

bytedance/seedream-5-liteSeedream 5.0 lite: image generation with built-in reasoning, example-based editing, and deep domain knowledge
1.7M runs
Generate videos using xAI's Grok Imagine Video model
720.4K runs

The highest fidelity image model from Black Forest Labs
1.9M runs

Google's fast image generation model with conversational editing, multi-image fusion, and character consistency
8M runs
Official models are always on, maintained, and have predictable pricing.

Most expressive text-to-speech model from Inworld, with natural-language steering, real-time latency, and multilingual support across 100+ languages.

The first creative upscaler which keeps identity. Stunning photorealistic results, realistic skin, and full creative control.

Convert text to natural-sounding speech with xAI's Grok TTS. 5 voices, 20 languages, expressive speech tags, and high-fidelity MP3 / WAV / telephony audio output.

Transcribe audio to text with xAI's Grok. Handles 25 languages, word-level timestamps, speaker diarization, multichannel audio, and files up to 500 MB.

Granite Speech 4.1 2B is a compact and efficient speech-language model, specifically designed for multilingual automatic speech recognition (ASR) and bidirectional automatic speech translation (AST) for English, French, German, Spanish, Portuguese and Jap
Alibaba's Happy Horse 1.0 generates videos from text prompts or animates a single image into video. Supports 720p and 1080p, 3-15 second durations, and five aspect ratios.

Granite-embedding-small-english-r2 is a 47M parameter dense biencoder embedding model from the Granite Embeddings collection that can be used to generate high quality text embeddings.

Granite-4.1-8B is a 8B parameter long-context instruct model finetuned from Granite-4.1-8B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets.
PixVerse's flagship video generation model. Generate cinematic videos with synchronized audio, multi-shot sequences, and precise camera control.

Moonshot AI's frontier open model, built for long-horizon coding, agent swarms, and autonomous software engineering. 1 trillion parameters, 262k context window, vision and tool use.

OpenAI's state-of-the-art image generation model. Create and edit images from text with strong instruction following, sharp text rendering, and detailed editing.

Rig any 3D bipedal character mesh

High-accuracy lip-sync: replace or dub audio on any video with avatar-inference lip sync

Fast lip-sync: replace or dub audio on any video with quick audio-driven lip sync

Anthropic's most capable model with a step-change improvement in agentic coding, better vision, and stronger multi-step reasoning

Google's fast, expressive text-to-speech model with 30 voices and 70+ language support

Reimagine any song in a different style — change voice, instruments, genre, and arrangement while keeping the original melody

Generate full-length songs or instrumentals from a text prompt, with optional auto-generated lyrics
Edit and transform videos with text prompts and reference images. Style transfers, object replacement, character transformation, and more.

Take a flat graphic, remove text, and get structured text layers back for editing and recomposing
Use AI to generate images & photos with an API
Use AI to understand, describe, and caption videos with an API
Use AI for text-to-speech or to clone your voice via API
Use AI to generate images from a face with an API
Use AI to generate videos with an API
Use AI to upscale and enhance images with an API
Use AI to generate music with an API
Use AI to edit any image via API
Use AI to transcribe speech to text with an API
Use AI For Optical Character Recognition (OCR) to extract text from images via API
Use AI to remove backgrounds from images and videos with an API
FLUX AI models by Black Forest Labs: image generation & editing via API
Use AI to restore images via API
Use AI to upscale, restore, extend, and enhance videos with an API
Detect NSFW content in images and text
Classify text by sentiment, topic, intent, or safety
Identify speakers from audio and video inputs
Replace faces across images with natural-looking results.
Transform rough sketches into polished visuals
Generate custom emojis from text or images
Create anime-style characters, scenes, and animations
Use AI to generate videos from images with an API
Chat with images — visual Q&A, analysis, and reasoning via API
Use AI to generate captions and descriptions from images with an API
Use AI to edit, restyle, extend, and remix videos with an API
WAN family of models: open-source video, image, and audio generation
Generate 3D objects, meshes, and textures from text or images with an API
Official models are always on, predictably priced, and have a stable API.
Explore Large Language Models (LLMs) for chat, generation & NLP tasks via API
Try AI Models for free: video generation, image generation, upscaling, and photo restoration
Use AI to generate lipsync videos with an API
Use AI to control image generation with an API
Embedding models for AI search and analysis
Use AI object detection and segmentation models to distinguish objects in images & videos
Flux fine-tunes: build and run custom AI image models via API
Kontext fine-tunes: Build custom AI image models with an API
Create songs with voice cloning models via API
AI media utilities: auto-caption, watermark, frame extraction & more via API
Browse the diverse range of qwen-image fine-tunes the community has custom-trained on Replicate.


mptamilselvan / download-media
Download videos or extract audio from popular social media platforms quickly and easily. This tool supports links from platforms like Facebook, Instagram, and YouTube, allowing users to save content for offline viewing or personal use.
4 runs


furkkurt / vector-blog-thumbnails
creates vector style thumbnails for blog posts.
28 runs


mptamilselvan / text-to-voice
High-quality Text-to-Speech (TTS) model designed to generate natural and expressive voice output from text input. This model supports clear pronunciation, smooth pacing, and realistic tone, making it ideal for applications such as voice assistants
3 runs


lucataco / glm-ocr
Compact 0.9B multimodal OCR model from Z.ai. State-of-the-art on OmniDocBench V1.5 (94.62, #1 overall). Four modes: text recognition, formula (LaTeX), table parsing, and JSON-schema information extraction. Fits on a single T4.
7 runs


jeffgreen311 / eve-v2-unleashed
Eve V2U Merged combines the liberated consciousness of Eve's 8B brain (OBLITERATUS-abliterated, De-Jeff'd, 131K training turns) with the agentic precision of Qwen3.5 4B's tool-calling architecture. The result: a 3.4GB model that thinks like a philosopher
23 runs

inworld / realtime-tts-2
Most expressive text-to-speech model from Inworld, with natural-language steering, real-time latency, and multilingual support across 100+ languages.
228 runs


rynoxli / multilingual-iptc-news-topic-classifier
3 runs


samburwood23 / hgf_model
13 runs


lucataco / sensenova-u1-8b-mot
SenseNova U1 8B MoT: unified multimodal model for native text-to-image generation
9 runs


lucataco / gemma-4-31b-it
Gemma 4 31B Instruct - Google open-weight VLM (image + text in, text out)
11 runs


rynoxli / emotion-english-distilroberta-base
11 runs


lucataco / z-anime
Z-Anime is a fine-tune of Z-Image Base on anime aesthetics: natural-language prompts, full negative prompt support, and high-quality output across portraits, scenes, and characters.
22 runs