Official

anthropic / claude-3.5-haiku

Anthropic's fastest, most cost-effective model, with a 200K token context window (claude-3-5-haiku-20241022)

Warm

Public
1.2M runs
Priced per token
Commercial use
License

Iterate in playground

Run with an API

Playground API Examples README

Input

Run this model in Node.js with one line of code:

npx create-replicate --model=anthropic/claude-3.5-haiku

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run anthropic/claude-3.5-haiku using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const input = {
  prompt: "Write a meta-haiku",
  max_tokens: 8192,
  system_prompt: ""
};

for await (const event of replicate.stream("anthropic/claude-3.5-haiku", { input })) {
  process.stdout.write(event.toString());
};

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run anthropic/claude-3.5-haiku using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

# The anthropic/claude-3.5-haiku model can stream output as it's running.
for event in replicate.stream(
    "anthropic/claude-3.5-haiku",
    input={
        "prompt": "Write a meta-haiku",
        "max_tokens": 8192,
        "system_prompt": ""
    },
):
    print(str(event), end="")

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run anthropic/claude-3.5-haiku using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "input": {
      "prompt": "Write a meta-haiku",
      "max_tokens": 8192,
      "system_prompt": ""
    }
  }' \
  https://api.replicate.com/v1/models/anthropic/claude-3.5-haiku/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Here's a meta-haiku about haiku itself: Syllables dancing Five, seven, five—poetry Speaks of its own form

Generated in

1.2 seconds

Input tokens

Output tokens

Tokens per second

29.48 tokens / second

Time to first token

14 milliseconds

Tweak it Report

Pricing

This model is priced by how many input tokens are sent and how many output tokens are generated.

Type	Per unit	Per $1
Input	$1.00 / 1M tokens or 1M tokens / $1
Output	$5.00 / 1M tokens or 200K tokens / $1

For example, for $10 you can run around 2,857 predictions where the input is a sentence or two (15 tokens) and the output is a few paragraphs (700 tokens).

Check out our docs for more information about how per-token pricing works on Replicate.

Readme

Claude 3.5 Haiku

Claude 3.5 Haiku is Anthropic’s fastest model, delivering advanced coding, tool use, and reasoning capabilities at an accessible price point.

This Replicate model is built on claude-3-5-haiku-20241022, hosted on Anthropic and Vertex APIs.

Key features

Improved performance across all skill sets compared to Claude 3 Haiku
Surpasses Claude 3 Opus on many intelligence benchmarks
Enhanced instruction following and tool use accuracy
Rapid inference speed and response times
Cost-effective pricing structure
Strong reasoning capabilities
Accurate code generation with multi-turn refinement
Natural writing style and enhanced conversational abilities

Use cases

Code completions and development workflows
Interactive chatbots and customer service
Data extraction and automated labeling
Real-time content moderation
High-volume user interactions
Personalized experience generation

Limitations

No persistent memory between conversations
Cannot access real-time information
May occasionally produce inaccurate responses
Limited to predefined training data
No ability to learn from interactions
Performance may vary with complex or ambiguous tasks

Safety

Extensive safety evaluations across multiple languages
Enhanced handling of sensitive content
Rigorous safety standards maintained
Comprehensive policy domain testing
Regular safety assessments

Getting started

To use Claude 3.5 Haiku via Replicate:

import replicate

output = replicate.run(
    "anthropic/claude-3.5-haiku",
    input={
        "prompt": "Your prompt here"
    }
)

Privacy policy & license

Data from this model is sent to Anthropic and Google Cloud Vertex AI.

Usage of this model is subject to Anthropic’s terms of service. Please refer to their website for full terms and conditions.