Home / Topics / Models

Pipeline models

Explanation of pipeline models on Replicate


Explore our curated collection of pipeline models

View Source Code


Pipeline models are a new kind of ephemeral CPU model that runs on Replicate using a dedicated runtime that’s optimized for speed. These models work like serverless functions: they run once and are then discarded, without any setup steps. The key feature is that they can call other Replicate models directly using the replicate Python client library.

These models make a lot of new things possible, because Replicate has a huge library of models that you can pipe together. Pipe a FLUX LoRA output into Kling for stylized text-to-video. Add a prompt upscaler with Claude Sonnet, add sound with mmaudio, etc. It’s just code! No complex orchestration, just plain Python. You can add whatever preprocessing or glue code you want.

Getting Started

Pipeline models make a lot of new things possible, because Replicate has a huge library of models that you can pipe together. Pipe a FLUX LoRA output into Kling for stylized text-to-video. Add a prompt upscaler with Claude Sonnet, add sound with mmaudio, etc. It’s just code! No complex orchestration, just plain Python. You can add whatever preprocessing or glue code you want.

Just like other Replicate models, you can run pipelines in the web UI or from the API, and you pay per second of compute and for the cost of the downstream models your pipeline calls.

🐇 Eager to create your first pipeline model? Check out the quickstart guide.

Running pipeline models

Just like other Replicate models, you can run pipelines in the web UI or from the API:

import replicate

flux_dev = replicate.use("black-forest-labs/flux-dev")
claude = replicate.use("anthropic/claude-4-sonnet")

def main() -> None:
    images = flux_dev(prompt="a cat wearing an amusing hat")
    result = claude(prompt="describe this image for me", image=images[0])

    print(str(result)) # "This shows an image of a cat wearing a hat ..."

Hardware

Pipeline models run on CPU hardware, specifically CPU 1x 2GB. See hardware pricing for more details.

The downstream models used by your pipeline model will run on a variety of different hardware types, depending on the model.

Billing

You pay per second of compute and for the cost of the downstream models your pipeline calls.

For both public and private pipelines, you only pay for the time it’s actively processing your requests. Setup and idle time for the model is free.

If a pipeline model fails, it will be billed for the duration of the run, plus the cost of the downstream models that were called.

See pricing for more details.

Cancellation

If you cancel a prediction request made to a pipeline that calls other models, we will cancel all downstream predictions that are queued or haven’t started. If a downstream prediction is already running, that prediction will continue to run to completion.

Stack depth

Pipeline models can call other pipeline models, up to a limit of 250 layers deep.

Creating pipeline models

You can create pipeline models in the web UI or from the API.

To get started creating your first pipeline model, check out the quickstart guide.

To develop a pipeline model on your own machine with your preferred editor and tools, check out the guide to building pipeline models locally.

Pipeline model features

Input & output types

Below are the Python annotations that Pipelines understands today and how each one shows up in the web UI / API.

AnnotationUI control / JSON typeTypical use-caseNotes
strText fieldPrompts, IDs, file URLsEmpty string "" allowed
int, floatNumber fieldSteps, CFG, FPS, seedAdd min= / max= via cog.Input
boolCheckboxFeature flags, safety togglesDefaults to false
cog.PathFile upload or presigned URLImages, videos, audio, weightsReturned paths are real files locally, saved in the tmp/ directory
list[str]Repeating text fieldBatch prompts, stop wordsMax 64 KB per item
dict, typing.AnyRaw JSONArbitrary config blobsPassed through untouched

Here’s an example:

import replicate
from pathlib import Path

# Flux takes a required prompt string and optional image and seed.
def hint(*, prompt: str, image: Path | None = None, seed: int | None = None) -> str: ...

flux_dev = replicate.use("black-forest-labs/flux-dev", hint=hint)

def main() -> None:
    output1 = flux_dev() # will warn that `prompt` is missing
    output2 = flux_dev(prompt="str") # output2 will be typed as `str`

Supported Python packages

There are limitations around what packages are available when running a pipeline on Replicate. Supported packages include:

anyio==4.9.0
certifi==2025.6.15
charset-normalizer==3.4.2
coglet @ https://github.com/replicate/cog-runtime/releases/download/v0.1.0-alpha31/coglet-0.1.0a31-py3-none-any.whl
decorator==5.2.1
h11==0.16.0
httpcore==1.0.9
httpx==0.28.1
idna==3.10
imageio==2.37.0
imageio-ffmpeg==0.6.0
joblib==1.5.1
moviepy==2.2.1
numpy==2.3.1
packaging==25.0
pillow==11.2.1
pip==25.1.1
proglog==0.1.12
pydantic==1.10.22
python-dotenv==1.1.1
replicate==1.1.0b2
requests==2.32.4
scikit-learn==1.7.0
scipy==1.16.0
sniffio==1.3.1
threadpoolctl==3.6.0
tqdm==4.67.1
typing-extensions==4.14.0
urllib3==2.5.0

For an up-to-date list of supported packages, see pipelines-runtime.replicate.delivery/requirements.txt.

Passing files between models

Outputs annotated as cog.Path behave like local files and feed directly into the next model:

import replicate

upscale = replicate.use("stability-ai/sd-xl-upscale")
caption = replicate.use("anthropic/claude-4-sonnet")

hi_res  = upscale(image="dog.jpg")
summary = caption(prompt="Describe the image", image=hi_res)

Running models in parallel

You can run multiple models in parallel by using:

function1 = replicate.use("my-name/function1")
function2 = replicate.use("my-name/function2")

run1 = function1.create(input1=value1, input2=value2)
run2 = function2.create(input1=value1, input2=value2)
    
output1 = run1.output()
output2 = run2.output()

Streaming outputs in real-time

Display partial tokens, images, or status updates as soon as they’re produced:

import replicate

claude = replicate.use("anthropic/claude-4-sonnet", streaming=True)
output = claude(prompt="Summarize War and Peace in emojis")

for chunk in output:
    print(chunk)

Async predictions for high concurrency

Await predictions inside asyncio apps (FastAPI, Quart, etc.) for better throughput:

import asyncio, replicate

flux   = replicate.use("black-forest-labs/flux-dev", use_async=True)
claude = replicate.use("anthropic/claude-4-sonnet", use_async=True)

async def handler():
    img_task, text_task = await asyncio.gather(
        flux(prompt="astronaut playing guitar on Mars"),
        claude(prompt="Write a song about Mars")
    )
    return text_task

Getting logs

To see the logs of downstream models, use run.logs():

import replicate

claude = replicate.use("anthropic/claude-4-sonnet")

def main() -> None:
	prediction = claude.create(prompt="Give me a recipe for tasty smashed avocado on sourdough toast that could feed all of California.")
	prediction.logs() # get current logs (WIP)
	prediction.output() # get the output

LLM-friendly docs

We maintain an LLM-friendly version of the pipeline models documentation that you can use in your AI-powered code-editing tools like Cursor, Copilot, or Claude to give them extensive knowledge of how pipeline models work, and how to author them.

Feed this URL into your preferred AI editor to give it context about pipeline models:

https://replicate.com/docs/reference/pipelines/llms.txt