Explore

Official models

Official models are always on, maintained, and have predictable pricing.

View all official models
realtime-tts-2

inworld / realtime-tts-2

Most expressive text-to-speech model from Inworld, with natural-language steering, real-time latency, and multilingual support across 100+ languages.

228 runs
Official
clarity-pro-upscaler

philz1337x / clarity-pro-upscaler

The first creative upscaler which keeps identity. Stunning photorealistic results, realistic skin, and full creative control.

748 runs
Official
grok-text-to-speech

xai / grok-text-to-speech

Convert text to natural-sounding speech with xAI's Grok TTS. 5 voices, 20 languages, expressive speech tags, and high-fidelity MP3 / WAV / telephony audio output.

460 runs
Official
grok-speech-to-text

xai / grok-speech-to-text

Transcribe audio to text with xAI's Grok. Handles 25 languages, word-level timestamps, speaker diarization, multichannel audio, and files up to 500 MB.

142 runs
Official
granite-speech-4.1-2b

ibm-granite / granite-speech-4.1-2b

Granite Speech 4.1 2B is a compact and efficient speech-language model, specifically designed for multilingual automatic speech recognition (ASR) and bidirectional automatic speech translation (AST) for English, French, German, Spanish, Portuguese and Jap

14 runs
Official

alibaba / happyhorse-1.0

Alibaba's Happy Horse 1.0 generates videos from text prompts or animates a single image into video. Supports 720p and 1080p, 3-15 second durations, and five aspect ratios.

3.5K runs
Official
granite-embedding-small-english-r2

ibm-granite / granite-embedding-small-english-r2

Granite-embedding-small-english-r2 is a 47M parameter dense biencoder embedding model from the Granite Embeddings collection that can be used to generate high quality text embeddings.

3 runs
Official
granite-4.1-8b

ibm-granite / granite-4.1-8b

Granite-4.1-8B is a 8B parameter long-context instruct model finetuned from Granite-4.1-8B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets.

132 runs
Official

pixverse / pixverse-v6

PixVerse's flagship video generation model. Generate cinematic videos with synchronized audio, multi-shot sequences, and precise camera control.

6.4K runs
Official
kimi-k2.6

moonshotai / kimi-k2.6

Moonshot AI's frontier open model, built for long-horizon coding, agent swarms, and autonomous software engineering. 1 trillion parameters, 262k context window, vision and tool use.

782 runs
Official
gpt-image-2

openai / gpt-image-2

OpenAI's state-of-the-art image generation model. Create and edit images from text with strong instruction following, sharp text rendering, and detailed editing.

848.6K runs
Official
create-character-v1

uthana / create-character-v1

Rig any 3D bipedal character mesh

51 runs
Official
lipsync-precision

heygen / lipsync-precision

High-accuracy lip-sync: replace or dub audio on any video with avatar-inference lip sync

469 runs
Official
lipsync-speed

heygen / lipsync-speed

Fast lip-sync: replace or dub audio on any video with quick audio-driven lip sync

256 runs
Official
claude-opus-4.7

anthropic / claude-opus-4.7

Anthropic's most capable model with a step-change improvement in agentic coding, better vision, and stronger multi-step reasoning

13.1K runs
Official
gemini-3.1-flash-tts

google / gemini-3.1-flash-tts

Google's fast, expressive text-to-speech model with 30 voices and 70+ language support

35K runs
Official
music-cover

minimax / music-cover

Reimagine any song in a different style — change voice, instruments, genre, and arrangement while keeping the original melody

530 runs
Official
music-2.6

minimax / music-2.6

Generate full-length songs or instrumentals from a text prompt, with optional auto-generated lyrics

3.3K runs
Official

decart / lucy-edit-2

Edit and transform videos with text prompts and reference images. Style transfers, object replacement, character transformation, and more.

65 runs
Official
layerize

ideogram-ai / layerize

Take a flat graphic, remove text, and get structured text layers back for editing and recomposing

5.7K runs
Official

OCR to extract text from images

Use AI For Optical Character Recognition (OCR) to extract text from images via API

Classify text

Classify text by sentiment, topic, intent, or safety

Create realistic face swaps

Replace faces across images with natural-looking results.

Vision models

Chat with images — visual Q&A, analysis, and reasoning via API

Caption Images

Use AI to generate captions and descriptions from images with an API

Edit your videos

Use AI to edit, restyle, extend, and remix videos with an API

WAN family of models

WAN family of models: open-source video, image, and audio generation

Create 3D content

Generate 3D objects, meshes, and textures from text or images with an API

Official models

Official models are always on, predictably priced, and have a stable API.

Large Language Models (LLMs)

Explore Large Language Models (LLMs) for chat, generation & NLP tasks via API

Try AI models for free

Try AI Models for free: video generation, image generation, upscaling, and photo restoration

Object detection and segmentation

Use AI object detection and segmentation models to distinguish objects in images & videos

Flux fine-tunes

Flux fine-tunes: build and run custom AI image models via API

Media utilities

AI media utilities: auto-caption, watermark, frame extraction & more via API

Qwen-Image fine-tunes

Browse the diverse range of qwen-image fine-tunes the community has custom-trained on Replicate.

Latest models