How to run Yi chat models with an API

Posted November 23, 2023 by

The Yi series models are large language models trained from scratch by developers at 01.AI. Today, they've released two new models: Yi-6B-Chat and Yi-34B-Chat. These models extend the base models, Yi-6B and Yi-34B, and are fine-tuned for chat completion.

Yi-34B currently holds the state-of-the-art for most benchmarks, beating larger models like Llama-70B.

How to run Yi-34B-Chat with an API

Yi-34B-Chat is on Replicate and you can run it in the cloud with a few lines of code.

You can run it with our JavaScript client:

import Replicate from "replicate";
 
const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});
 
const output = await replicate.run(
  "01-ai/yi-34b-chat:914692bbe8a8e2b91a4e44203e70d170c9c5ccc1359b283c84b0ec8d47819a46",
  {
    input: {
      prompt:
        "Write a poem about Parmigiano Reggiano.",
    },
  }
);

Or, our Python client:

import replicate
output = replicate.run(
    "01-ai/yi-34b-chat:914692bbe8a8e2b91a4e44203e70d170c9c5ccc1359b283c84b0ec8d47819a46",
    input={"prompt": "Write a poem about Parmigiano Reggiano."}
)
# The 01-ai/yi-6b-chat model can stream output as it's running.
# The predict method returns an iterator, and you can iterate over that output.
for item in output:
    print(item, end="")

Or, you can call the HTTP API directly with tools like cURL:

curl -s -X POST \
  -d '{"version": "914692bbe8a8e2b91a4e44203e70d170c9c5ccc1359b283c84b0ec8d47819a46", "input": {"prompt": "Write a poem..."}}' \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  "https://api.replicate.com/v1/predictions"

You can also run Yi chat models using other client libraries for Go, Swift, Elixir, and others.

Next steps