How to run Yi chat models with an API

Posted by @nateraw

The Yi series models are large language models trained from scratch by developers at 01.AI. Today, they’ve released two new models: Yi-6B-Chat and Yi-34B-Chat. These models extend the base models, Yi-6B and Yi-34B, and are fine-tuned for chat completion.

Yi-34B currently holds the state-of-the-art for most benchmarks, beating larger models like Llama-70B.

How to run Yi-34B-Chat with an API

Yi-34B-Chat is on Replicate and you can run it in the cloud with a few lines of code.

You can run it with our JavaScript client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

const output = await replicate.run(
  "01-ai/yi-34b-chat:914692bbe8a8e2b91a4e44203e70d170c9c5ccc1359b283c84b0ec8d47819a46",
  {
    input: {
      prompt:
        "Write a poem about Parmigiano Reggiano.",
    },
  }
);

Or, our Python client:

import replicate
output = replicate.run(
    "01-ai/yi-34b-chat:914692bbe8a8e2b91a4e44203e70d170c9c5ccc1359b283c84b0ec8d47819a46",
    input={"prompt": "Write a poem about Parmigiano Reggiano."}
)
# The 01-ai/yi-6b-chat model can stream output as it's running.
# The predict method returns an iterator, and you can iterate over that output.
for item in output:
    print(item, end="")

Or, you can call the HTTP API directly with tools like cURL:

curl -s -X POST \
  -d '{"version": "914692bbe8a8e2b91a4e44203e70d170c9c5ccc1359b283c84b0ec8d47819a46", "input": {"prompt": "Write a poem..."}}' \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  "https://api.replicate.com/v1/predictions"

You can also run Yi chat models using other client libraries for Golang, Swift, Elixir, and others.

Next steps