kcaverly/openchat-3.5-1210-gguf:0d142640 – Run with an API on Replicate

Version

You're looking at a specific version of this model. Jump to the model overview.

kcaverly /openchat-3.5-1210-gguf:0d142640

Input

prompt

*string

Shift + Return to add a new line

Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?

Instruction for model

prompt_template

string

Shift + Return to add a new line

GPT Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant: GPT Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant: 

Template to pass to model. Override if you are providing multi-turn instructions.

Default: "GPT Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant: "

max_new_tokens

integer

Maximum new tokens to generate.

Default: -1

repeat_penalty

number

This parameter plays a role in controlling the behavior of an AI language model during conversation or text generation. Its purpose is to discourage the model from repeating itself too often by increasing the likelihood of following up with different content after each response. By adjusting this parameter, users can influence the model's tendency to either stay within familiar topics (lower penalty) or explore new ones (higher penalty). For instance, setting a high repeat penalty might result in more varied and dynamic conversations, whereas a low penalty could be suitable for scenarios where consistency and predictability are preferred.

Default: 1.1

temperature

number

This parameter used to control the 'warmth' or responsiveness of an AI model based on the LLaMA architecture. It adjusts how likely the model is to generate new, unexpected information versus sticking closely to what it has been trained on. A higher value for this parameter can lead to more creative and diverse responses, while a lower value results in safer, more conservative answers that are closer to those found in its training data. This parameter is particularly useful when fine-tuning models for specific tasks where you want to balance between generating novel insights and maintaining accuracy and coherence.

Default: 0.7

Run this model in Node.js with one line of code:

npx create-replicate --model=kcaverly/openchat-3.5-1210-gguf

or set up a project from scratch

Install Replicate’s Node.js client library

npm install replicate

Set the REPLICATE_API_TOKEN environment variable

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run kcaverly/openchat-3.5-1210-gguf using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "kcaverly/openchat-3.5-1210-gguf:0d1426400ae23540eef130c0cd6cbd7184ac47cffee9dfd16fdf02d065df123b",
  {
    input: {
      prompt: "Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?",
      temperature: 0.7,
      max_new_tokens: -1,
      repeat_penalty: 1.1,
      prompt_template: "GPT Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant: "
    }
  }
);
console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library

pip install replicate

Set the REPLICATE_API_TOKEN environment variable

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client

import replicate

Run kcaverly/openchat-3.5-1210-gguf using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "kcaverly/openchat-3.5-1210-gguf:0d1426400ae23540eef130c0cd6cbd7184ac47cffee9dfd16fdf02d065df123b",
    input={
        "prompt": "Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?",
        "temperature": 0.7,
        "max_new_tokens": -1,
        "repeat_penalty": 1.1,
        "prompt_template": "GPT Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant: "
    }
)

# The kcaverly/openchat-3.5-1210-gguf model can stream output as it's running.
# The predict method returns an iterator, and you can iterate over that output.
for item in output:
    # https://replicate.com/kcaverly/openchat-3.5-1210-gguf/api#output-schema
    print(item, end="")

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run kcaverly/openchat-3.5-1210-gguf using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "0d1426400ae23540eef130c0cd6cbd7184ac47cffee9dfd16fdf02d065df123b",
    "input": {
      "prompt": "Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?",
      "temperature": 0.7,
      "max_new_tokens": -1,
      "repeat_penalty": 1.1,
      "prompt_template": "GPT Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant: "
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Since Sally is the only girl in her family, she must be considered as one of the "sisters" mentioned. Therefore, Sally has 2 sisters (including herself).

Generated in

0.9 seconds

Tweak it Report