Pricing

You only pay for what you use on Replicate, billed by the second. When you don't run anything, it scales to zero and you don't pay a thing.

HardwarePriceGPUCPUGPU RAMRAM
CPU
cpu
$0.000100/sec
$0.36/hr
-4x-8GB
Nvidia A100 (80GB) GPU
gpu-a100-large
$0.001400/sec
$5.04/hr
1x10x80GB144GB
2x Nvidia A100 (80GB) GPU
gpu-a100-large-2x
$0.002800/sec
$10.08/hr
2x20x160GB288GB
4x Nvidia A100 (80GB) GPU
gpu-a100-large-4x
$0.005600/sec
$20.16/hr
4x40x320GB576GB
8x Nvidia A100 (80GB) GPU
gpu-a100-large-8x
$0.011200/sec
$40.32/hr
8x80x640GB960GB
Nvidia L40S GPU
gpu-l40s
$0.000975/sec
$3.51/hr
1x10x48GB65GB
2x Nvidia L40S GPU
gpu-l40s-2x
$0.001950/sec
$7.02/hr
2x20x96GB144GB
4x Nvidia L40S GPU
gpu-l40s-4x
$0.003900/sec
$14.04/hr
4x40x192GB288GB
8x Nvidia L40S GPU
gpu-l40s-8x
$0.007800/sec
$28.08/hr
8x80x384GB576GB
Nvidia T4 GPU
gpu-t4
$0.000225/sec
$0.81/hr
1x4x16GB16GB
Additional hardware
Nvidia H100 GPU
gpu-h100
$0.001525/sec
$5.49/hr
Flux fine-tunes run on H100s; additional H100 capacity is reserved for committed spend contracts.
2x Nvidia H100 GPU
gpu-h100-2x
$0.003050/sec
$10.98/hr
Flux fine-tunes run on H100s; additional H100 capacity is reserved for committed spend contracts.
4x Nvidia H100 GPU
gpu-h100-4x
$0.006100/sec
$21.96/hr
Flux fine-tunes run on H100s; additional H100 capacity is reserved for committed spend contracts.
8x Nvidia H100 GPU
gpu-h100-8x
$0.012200/sec
$43.92/hr
Flux fine-tunes run on H100s; additional H100 capacity is reserved for committed spend contracts.

Public models

Thousands of open-source machine learning models have been contributed by our community and more are added every day. When running or training one of these models, you only pay for time it takes to process your request.

Each model runs on different hardware and takes a different amount of time to run. You'll find estimates for how much they cost under "Run time and cost" on the model's page. For example, for stability-ai/sdxl:

This model costs approximately $0.0036 to run on Replicate, or 277 runs per $1, but this varies depending on your inputs.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 4 seconds.

ModelOutput
black-forest-labs/flux-1.1-pro$0.040 / image
black-forest-labs/flux-1.1-pro-ultra$0.060 / image
black-forest-labs/flux-1.1-pro-ultra-finetuned$0.070 / image
black-forest-labs/flux-canny-dev$0.025 / image
black-forest-labs/flux-canny-pro$0.050 / image
black-forest-labs/flux-depth-dev$0.025 / image
black-forest-labs/flux-depth-pro$0.050 / image
black-forest-labs/flux-dev$0.025 / image
black-forest-labs/flux-dev-lora$0.032 / image
black-forest-labs/flux-fill-dev$0.040 / image
black-forest-labs/flux-fill-pro$0.050 / image
black-forest-labs/flux-pro$0.055 / image
black-forest-labs/flux-redux-dev$0.025 / image
black-forest-labs/flux-redux-schnell$0.003 / image
black-forest-labs/flux-schnell$0.003 / image
black-forest-labs/flux-schnell-lora$0.020 / image
easel/advanced-face-swap$0.040 / image
easel/ai-avatars$0.050 / image
google/imagen-3$0.050 / image
google/imagen-3-fast$0.025 / image
ideogram-ai/ideogram-v2$0.080 / image
ideogram-ai/ideogram-v2a$0.040 / image
ideogram-ai/ideogram-v2a-turbo$0.025 / image
ideogram-ai/ideogram-v2-turbo$0.050 / image
ideogram-ai/ideogram-v3-balanced$0.070 / image
ideogram-ai/ideogram-v3-quality$0.100 / image
ideogram-ai/ideogram-v3-turbo$0.040 / image
luma/photon$0.030 / image
luma/photon-flash$0.010 / image
minimax/image-01$0.010 / image
recraft-ai/recraft-20b$0.022 / image
recraft-ai/recraft-20b-svg$0.044 / image
recraft-ai/recraft-creative-upscale$0.300 / image
recraft-ai/recraft-crisp-upscale$0.006 / image
recraft-ai/recraft-v3$0.040 / image
recraft-ai/recraft-v3-svg$0.080 / image
stability-ai/stable-diffusion-3$0.035 / image
stability-ai/stable-diffusion-3.5-large$0.065 / image
stability-ai/stable-diffusion-3.5-large-turbo$0.040 / image
stability-ai/stable-diffusion-3.5-medium$0.035 / image
topazlabs/image-upscale$0.300 / image

Audio models

Replicate hosts some audio models that are either priced per audio file, or per second of audio generated by the model.

ModelOutput
minimax/music-01$0.035 / audio-file
ModelOutput
playht/play-dialog$0.001 / second of audio

Video models

Replicate hosts some video models that are either priced per video, or per second of video generated by the model.

ModelOutput
luma/ray$0.45 / video
minimax/video-01$0.50 / video
minimax/video-01-director$0.50 / video
minimax/video-01-live$0.50 / video
ModelOutput
google/veo-2$0.500 / second of video
haiper-ai/haiper-video-2$0.050 / second of video
kwaivgi/kling-v1.6-pro$0.098 / second of video
kwaivgi/kling-v1.6-standard$0.056 / second of video
kwaivgi/kling-v2.0$0.280 / second of video
luma/ray-2-540p$0.100 / second of video
luma/ray-2-720p$0.180 / second of video
luma/ray-flash-2-540p$0.033 / second of video
luma/ray-flash-2-720p$0.060 / second of video
topazlabs/video-upscale$0.100 / second of video
wavespeedai/hunyuan-video-fast$0.200 / second of video
wavespeedai/wan-2.1-i2v-480p$0.090 / second of video
wavespeedai/wan-2.1-i2v-720p$0.250 / second of video
wavespeedai/wan-2.1-t2v-480p$0.070 / second of video
wavespeedai/wan-2.1-t2v-720p$0.240 / second of video

Training models

Replicate hosts some training models that are priced per training step.

ModelInput
black-forest-labs/flux-pro-trainer$0.014 / training step

Language models

Replicate hosts some language models that are priced per token.

ModelInputOutput
anthropic/claude-3.5-haiku$1.00 / 1M tokens$5.00 / 1M tokens
anthropic/claude-3.5-sonnet$3.75 / 1M tokens$18.75 / 1M tokens
anthropic/claude-3.7-sonnet$3.00 / 1M tokens$15.00 / 1M tokens
deepseek-ai/deepseek-r1$3.75 / 1M tokens$10.00 / 1M tokens
deepseek-ai/deepseek-v3$1.45 / 1M tokens$1.45 / 1M tokens
ibm-granite/granite-20b-code-instruct-8k$0.10 / 1M tokens$0.50 / 1M tokens
ibm-granite/granite-3.0-2b-instruct$0.03 / 1M tokens$0.25 / 1M tokens
ibm-granite/granite-3.0-8b-instruct$0.05 / 1M tokens$0.25 / 1M tokens
ibm-granite/granite-3.1-2b-instruct$0.03 / 1M tokens$0.25 / 1M tokens
ibm-granite/granite-3.1-8b-instruct$0.03 / 1M tokens$0.25 / 1M tokens
ibm-granite/granite-3.2-8b-instruct$0.03 / 1M tokens$0.25 / 1M tokens
ibm-granite/granite-3.3-8b-instruct$0.03 / 1M tokens$0.25 / 1M tokens
ibm-granite/granite-8b-code-instruct-128k$0.05 / 1M tokens$0.25 / 1M tokens
meta/llama-2-13b$0.10 / 1M tokens$0.50 / 1M tokens
meta/llama-2-13b-chat$0.10 / 1M tokens$0.50 / 1M tokens
meta/llama-2-70b$0.65 / 1M tokens$2.75 / 1M tokens
meta/llama-2-70b-chat$0.65 / 1M tokens$2.75 / 1M tokens
meta/llama-2-7b$0.05 / 1M tokens$0.25 / 1M tokens
meta/llama-2-7b-chat$0.05 / 1M tokens$0.25 / 1M tokens
meta/llama-4-maverick-instruct$0.25 / 1M tokens$0.95 / 1M tokens
meta/llama-4-scout-instruct$0.17 / 1M tokens$0.65 / 1M tokens
meta/meta-llama-3.1-405b-instruct$9.50 / 1M tokens$9.50 / 1M tokens
meta/meta-llama-3-70b$0.65 / 1M tokens$2.75 / 1M tokens
meta/meta-llama-3-70b-instruct$0.65 / 1M tokens$2.75 / 1M tokens
meta/meta-llama-3-8b$0.05 / 1M tokens$0.25 / 1M tokens
meta/meta-llama-3-8b-instruct$0.05 / 1M tokens$0.25 / 1M tokens
mistralai/mistral-7b-v0.1$0.05 / 1M tokens$0.25 / 1M tokens

Private models

You aren't limited to the public models on Replicate: you can deploy your own custom models using Cog, our open-source tool for packaging machine learning models.

Unlike public models, most private models (with the exception of fast booting models) run on dedicated hardware so you don't have to share a queue with anyone else. This means you pay for all the time instances of the model are online: the time they spend setting up; the time they spend idle, waiting for requests; and the time they spend active, processing your requests. If you get a ton of traffic, we automatically scale up and down to handle the demand.

For fast booting models you'll only be billed for the time the model is active and processing your requests, so you won't pay for idle time like with other private models. Fast booting versions of models are labeled as such in the model's version list.

As with public models, if you would like more control over how a private model is run, you can use a deployments.

Learn more

For a deeper dive, check out how billing works on Replicate.

Enterprise & volume discounts

If you need more support or have complex requirements, we can offer:

  • Dedicated account manager
  • Priority support
  • Higher GPU limits
  • Performance SLAs
  • Help with onboard, custom models, and optimizations
  • Single sign-on

We've also got volume discounts for large amounts of spend. Email us at sales@replicate.com to learn more.