openai/whisper
Convert speech in audio to text
122.7M runs
jaaari/kokoro-82m
Kokoro v1.0 - text-to-speech (82M params, based on StyleTTS2)
40.3M runs
andreasjansson/clip-features
Return CLIP features for the clip-vit-large-patch14 model
102.1M runs
turian/insanely-fast-whisper-with-video
whisper-large-v3, incredibly fast, with video transcription
5.4M runs
openai/gpt-5-structured
GPT-5 with support for structured outputs, web search and custom tools
21.4K runs
qwen/qwen-image
An image generation foundation model in the Qwen series that achieves significant advances in complex text rendering.
258.3K runs
google/nano-banana
Google's latest image editing model in Gemini 2.5
2.2M runs
kwaivgi/kling-v2.1
Use Kling v2.1 to generate 5s and 10s videos in 720p and 1080p resolution from a starting image (image-to-video)
869.1K runs
minimax/hailuo-02
Hailuo 2 is a text-to-video and image-to-video model that can make 6s or 10s videos at 768p (standard) or 1080p (pro). It excels at real world physics.
71.9K runs
deepseek-ai/deepseek-v3.1
Latest hybrid thinking model from Deepseek
1.3K runs
pixverse/pixverse-v5
Create 5s-8s videos with enhanced character movement, visual effects, and exclusive 1080p-8s support. Optimized for anime characters and complex actions
7.4K runs
qwen/qwen-image-lora-trainer
Fine-tunable Qwen Image model with exceptional composition abilities - train custom LoRAs for any style or subject
139 runs
qwen/qwen-image-edit
Edit images using a prompt. This model extends Qwen-Image’s unique text rendering capabilities to image editing tasks, enabling precise text editing
171.4K runs
wan-video/wan-2.2-t2v-fast
A very fast and cheap PrunaAI optimized version of Wan 2.2 A14B text-to-video
58.9K runs
prunaai/wan-2.2-image
This model generates beautiful cinematic 2 megapixel images in 3-4 seconds and is derived from the Wan 2.2 model through optimisation techniques from the pruna package
211.7K runs
bytedance/seedream-3
A text-to-image model with support for native high-resolution (2K) image generation
1.3M runs
Official models are always on, maintained, and have predictable pricing.
Granite-3.3-8B-Instruct is a 8-billion parameter 128K context length language model fine-tuned for improved reasoning and instruction-following capabilities.
Add lip-sync to any video with an audio file or text
GPT-5 with support for structured outputs, web search and custom tools
An image generation foundation model in the Qwen series that achieves significant advances in complex text rendering.
A pro version of Seedance that offers text-to-video and image-to-video support for 5s or 10s videos, at 480p and 1080p resolution
A video generation model that offers text-to-video and image-to-video support for 5s or 10s videos, at 480p and 720p resolution
Google's latest image generation model in Gemini 2.5
Google's latest image editing model in Gemini 2.5
A premium version of Kling v2.1 with superb dynamics and prompt adherence. Generate 1080p 5s and 10s videos from text or an image
Use Kling v2.1 to generate 5s and 10s videos in 720p and 1080p resolution from a starting image (image-to-video)
Generate 5s and 10s videos in 1080p resolution at 30fps
Generate 5s and 10s videos in 720p resolution at 30fps
Generate 5s and 10s videos in 1080p resolution
Generate 5s and 10s videos in 720p resolution at 30fps
Generate 5s and 10s videos in 720p resolution
Hailuo 2 is a text-to-video and image-to-video model that can make 6s or 10s videos at 768p (standard) or 1080p (pro). It excels at real world physics.
Latest hybrid thinking model from Deepseek
Automated background removal for images. Tuned for AI-generated content, product photos, portraits, and design workflows
Convert raster images to high-quality SVG format with precision and clean vector paths, perfect for logos, icons, and scalable graphics.
Create 5s-8s videos with enhanced character movement, visual effects, and exclusive 1080p-8s support. Optimized for anime characters and complex actions
Use AI To Generate Images & Photos with an API
Use AI To Caption Videos with an API
Convert text to speech
Make realistic images of people instantly
Use AI To Generate Videos with an API
Upscaling models that create high-quality images from low-quality images
Use AI To Generate Music with an API
Use AI To Edit Any Image with an API
Models that convert speech to text
Optical character recognition (OCR) and text extraction
Models that remove backgrounds from images and videos
The FLUX family of text-to-image models from Black Forest Labs
Models that improve or restore images by deblurring, colorization, and removing noise
Upscaling models that create high-quality video from low-quality videos
Browse the diverse range of qwen-image fine-tunes the community has custom-trained on Replicate
Models that can understand and generate text
Toolbelt-type models for videos and images.
Use AI To Caption Images with an API
Use AI To Generate Videos from images with an API
Generate videos with Wan, the fastest and highest quality open-source video generation model.
Browse the diverse range of fine-tunes the community has custom-trained on Replicate
Ask language models about images
Models that generate 3D objects, scenes, radiance fields, textures and multi-views.
Guide image generation with more than just text. Use edge detection, depth maps, and sketches to get the results you want.
Voice-to-voice cloning and musical prosody
Models that generate embeddings from inputs
Get started with these models without adding a credit card. Whether you're making videos, generating images, or upscaling photos, these are great starting points.
Official models are always on, maintained, and have predictable pricing.
Models that detect or segment objects in images and videos.
Browse the diverse range of fine-tunes the community has custom-trained on Replicate
ddvinh1/new-faceswap-video
66 runs
zsxkib/embedding-gemma-300m
Turn any text into 768-dimensional vectors for search, classification, and AI apps 🧠✨
26 runs
zhouyi531/ultimate-face-enhance
This is an excellent method for converting blurry facial images into high-definition ones.
34 runs
ibm-granite/granite-3.3-8b-instruct
Granite-3.3-8B-Instruct is a 8-billion parameter 128K context length language model fine-tuned for improved reasoning and instruction-following capabilities.
1.2M runs
meatballhat/turtle-head
30 runs
kwaivgi/kling-lip-sync
Add lip-sync to any video with an audio file or text
11.3K runs
andreasjansson/ad-carousel
Advertising text in animated bubbles go swoosh
54 runs
openai/gpt-5-structured
GPT-5 with support for structured outputs, web search and custom tools
21.4K runs
qwen/qwen-image
An image generation foundation model in the Qwen series that achieves significant advances in complex text rendering.
258.3K runs
pixaplay/joshua
AI-generated character model of Joshua - a weathered man with curly grey hair and beard, trained for consistent character generation across different scenes and environments.
17 runs
bytedance/seedance-1-pro
A pro version of Seedance that offers text-to-video and image-to-video support for 5s or 10s videos, at 480p and 1080p resolution
387.6K runs
bytedance/seedance-1-lite
A video generation model that offers text-to-video and image-to-video support for 5s or 10s videos, at 480p and 720p resolution
655.5K runs