top_p: 0.95
prompt: Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?
temperature: 0.7
system_prompt: You are a helpful assistant
length_penalty: 1
max_new_tokens: 512
stop_sequences: <|end_of_text|>,<|eot_id|>
prompt_template: <|begin_of_text|><|start_header_id|>system<|end_header_id|> {system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|> {prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
presence_penalty: 0

{
  "top_p": 0.95,
  "prompt": "Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?",
  "temperature": 0.7,
  "system_prompt": "You are a helpful assistant",
  "length_penalty": 1,
  "max_new_tokens": 512,
  "stop_sequences": "<|end_of_text|>,<|eot_id|>",
  "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
  "presence_penalty": 0
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run meta/meta-llama-3-8b-instruct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const input = {
  top_p: 0.95,
  prompt: "Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?",
  temperature: 0.7,
  system_prompt: "You are a helpful assistant",
  length_penalty: 1,
  max_new_tokens: 512,
  stop_sequences: "<|end_of_text|>,<|eot_id|>",
  prompt_template: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
  presence_penalty: 0
};

for await (const event of replicate.stream("meta/meta-llama-3-8b-instruct", { input })) {
  process.stdout.write(event.toString());
};

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run meta/meta-llama-3-8b-instruct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

# The meta/meta-llama-3-8b-instruct model can stream output as it's running.
for event in replicate.stream(
    "meta/meta-llama-3-8b-instruct",
    input={
        "top_p": 0.95,
        "prompt": "Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?",
        "temperature": 0.7,
        "system_prompt": "You are a helpful assistant",
        "length_penalty": 1,
        "max_new_tokens": 512,
        "stop_sequences": "<|end_of_text|>,<|eot_id|>",
        "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
        "presence_penalty": 0
    },
):
    print(str(event), end="")

To learn more, take a look at the guide on getting started with Python.

Run meta/meta-llama-3-8b-instruct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "input": {
      "top_p": 0.95,
      "prompt": "Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?",
      "temperature": 0.7,
      "system_prompt": "You are a helpful assistant",
      "length_penalty": 1,
      "max_new_tokens": 512,
      "stop_sequences": "<|end_of_text|>,<|eot_id|>",
      "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\\n\\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\\n\\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\n",
      "presence_penalty": 0
    }
  }' \
  https://api.replicate.com/v1/models/meta/meta-llama-3-8b-instruct/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

The number of parameters in a neural network can impact its speed, but it's not the only factor. In general, a larger number of parameters can lead to: 1. Increased computational complexity: More parameters mean more calculations are required to process the data. 2. Increased memory requirements: Larger models require more memory to store their parameters, which can impact system performance. However, it's worth noting that the relationship between the number of parameters and speed is not always linear. Other factors, such as: * Model architecture * Optimizer choice * Hyperparameter tuning can also impact the speed of a neural network. In the case of Johnny and Tommy, it's difficult to say which one's model will be faster without more information about the models themselves.

{
  "completed_at": "2024-05-03T13:45:15.445073Z",
  "created_at": "2024-05-03T13:45:13.788000Z",
  "data_removed": false,
  "error": null,
  "id": "855g9wxd7hrgp0cf7tsv1ewzgc",
  "input": {
    "top_p": 0.95,
    "prompt": "Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?",
    "temperature": 0.7,
    "system_prompt": "You are a helpful assistant",
    "length_penalty": 1,
    "max_new_tokens": 512,
    "stop_sequences": "<|end_of_text|>,<|eot_id|>",
    "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
    "presence_penalty": 0
  },
  "logs": "Random seed used: `57440`\nNote: Random seed will not impact output if greedy decoding is used.\nFormatted prompt: `<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nJohnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n`Random seed used: `57440`\nNote: Random seed will not impact output if greedy decoding is used.\nFormatted prompt: `<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nJohnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n`",
  "metrics": {
    "total_time": 1.657073,
    "input_token_count": 39,
    "tokens_per_second": 92.80206135476371,
    "output_token_count": 149,
    "predict_time": 1.652461,
    "time_to_first_token": 0.060728942999999994
  },
  "output": [
    "The",
    " number",
    " of",
    " parameters",
    " in",
    " a",
    " neural",
    " network",
    " can",
    " impact",
    " its",
    " speed",
    ",",
    " but",
    " it",
    "'s",
    " not",
    " the",
    " only",
    " factor",
    ".\n\n",
    "In",
    " general",
    ",",
    " a",
    " larger",
    " number",
    " of",
    " parameters",
    " can",
    " lead",
    " to",
    ":\n\n",
    "1",
    ".",
    " Increased",
    " computational",
    " complexity",
    ":",
    " More",
    " parameters",
    " mean",
    " more",
    " calculations",
    " are",
    " required",
    " to",
    " process",
    " the",
    " data",
    ".\n",
    "2",
    ".",
    " Increased",
    " memory",
    " requirements",
    ":",
    " Larger",
    " models",
    " require",
    " more",
    " memory",
    " to",
    " store",
    " their",
    " parameters",
    ",",
    " which",
    " can",
    " impact",
    " system",
    " performance",
    ".\n\n",
    "However",
    ",",
    " it",
    "'s",
    " worth",
    " noting",
    " that",
    " the",
    " relationship",
    " between",
    " the",
    " number",
    " of",
    " parameters",
    " and",
    " speed",
    " is",
    " not",
    " always",
    " linear",
    ".",
    " Other",
    " factors",
    ",",
    " such",
    " as",
    ":\n\n",
    "*",
    " Model",
    " architecture",
    "\n",
    "*",
    " Optim",
    "izer",
    " choice",
    "\n",
    "*",
    " Hyper",
    "parameter",
    " tuning",
    "\n\n",
    "can",
    " also",
    " impact",
    " the",
    " speed",
    " of",
    " a",
    " neural",
    " network",
    ".\n\n",
    "In",
    " the",
    " case",
    " of",
    " Johnny",
    " and",
    " Tommy",
    ",",
    " it",
    "'s",
    " difficult",
    " to",
    " say",
    " which",
    " one",
    "'s",
    " model",
    " will",
    " be",
    " faster",
    " without",
    " more",
    " information",
    " about",
    " the",
    " models",
    " themselves",
    "."
  ],
  "started_at": "2024-05-03T13:45:13.792612Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://streaming-api.svc.us.c.replicate.net/v1/streams/hscsfwedhigbbnorfpq7c4i3lbac5srhaqgvb4b5m2iuof3rotwq",
    "get": "https://api.replicate.com/v1/predictions/855g9wxd7hrgp0cf7tsv1ewzgc",
    "cancel": "https://api.replicate.com/v1/predictions/855g9wxd7hrgp0cf7tsv1ewzgc/cancel"
  },
  "version": "hidden"
}

Generated in

1.7 seconds

Input tokens

Output tokens

140

Tokens per second

92.80 tokens / second

Time to first token

61 milliseconds

Tweak it Share Report

Want to make some of these yourself?

Run this model