Prediction

Official model

meta/meta-llama-3-8b-instruct

855g9wxd7hrgp0cf7tsv1ewzgc

Status

Succeeded

Source

Web

Total duration

1.7s

Created

over 1 year ago

Webhook

–

Input

prompt: Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?
system_prompt: You are a helpful assistant
max_new_tokens: 512
temperature: 0.7
top_p: 0.95
stop_sequences: <|end_of_text|>,<|eot_id|>
length_penalty: 1
presence_penalty: 0
prompt_template: <|begin_of_text|><|start_header_id|>system<|end_header_id|> {system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|> {prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{
  "length_penalty": 1,
  "max_new_tokens": 512,
  "presence_penalty": 0,
  "prompt": "Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?",
  "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
  "stop_sequences": "<|end_of_text|>,<|eot_id|>",
  "system_prompt": "You are a helpful assistant",
  "temperature": 0.7,
  "top_p": 0.95
}

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=r8_Doi**********************************

This is your API token. Keep it to yourself.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run meta/meta-llama-3-8b-instruct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const input = {
  length_penalty: 1,
  max_new_tokens: 512,
  presence_penalty: 0,
  prompt: "Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?",
  prompt_template: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
  stop_sequences: "<|end_of_text|>,<|eot_id|>",
  system_prompt: "You are a helpful assistant",
  temperature: 0.7,
  top_p: 0.95
};

for await (const event of replicate.stream("meta/meta-llama-3-8b-instruct", { input })) {
  process.stdout.write(event.toString());
};

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=r8_Doi**********************************

This is your API token. Keep it to yourself.

Import the client:

import replicate

Run meta/meta-llama-3-8b-instruct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

# The meta/meta-llama-3-8b-instruct model can stream output as it's running.
for event in replicate.stream(
    "meta/meta-llama-3-8b-instruct",
    input={
        "length_penalty": 1,
        "max_new_tokens": 512,
        "presence_penalty": 0,
        "prompt": "Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?",
        "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
        "stop_sequences": "<|end_of_text|>,<|eot_id|>",
        "system_prompt": "You are a helpful assistant",
        "temperature": 0.7,
        "top_p": 0.95
    },
):
    print(str(event), end="")

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=r8_Doi**********************************

This is your API token. Keep it to yourself.

Run meta/meta-llama-3-8b-instruct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "input": {
      "length_penalty": 1,
      "max_new_tokens": 512,
      "presence_penalty": 0,
      "prompt": "Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?",
      "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\\n\\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\\n\\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\n",
      "stop_sequences": "<|end_of_text|>,<|eot_id|>",
      "system_prompt": "You are a helpful assistant",
      "temperature": 0.7,
      "top_p": 0.95
    }
  }' \
  https://api.replicate.com/v1/models/meta/meta-llama-3-8b-instruct/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

The number of parameters in a neural network can impact its speed, but it's not the only factor.

In general, a larger number of parameters can lead to:

1. Increased computational complexity: More parameters mean more calculations are required to process the data.
2. Increased memory requirements: Larger models require more memory to store their parameters, which can impact system performance.

However, it's worth noting that the relationship between the number of parameters and speed is not always linear. Other factors, such as:

* Model architecture
* Optimizer choice
* Hyperparameter tuning

can also impact the speed of a neural network.

In the case of Johnny and Tommy, it's difficult to say which one's model will be faster without more information about the models themselves.

{
  "id": "855g9wxd7hrgp0cf7tsv1ewzgc",
  "model": "meta/meta-llama-3-8b-instruct",
  "version": "hidden",
  "input": {
    "length_penalty": 1,
    "max_new_tokens": 512,
    "presence_penalty": 0,
    "prompt": "Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?",
    "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
    "stop_sequences": "<|end_of_text|>,<|eot_id|>",
    "system_prompt": "You are a helpful assistant",
    "temperature": 0.7,
    "top_p": 0.95
  },
  "logs": "Random seed used: `57440`\nNote: Random seed will not impact output if greedy decoding is used.\nFormatted prompt: `<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nJohnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n`Random seed used: `57440`\nNote: Random seed will not impact output if greedy decoding is used.\nFormatted prompt: `<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nJohnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n`",
  "output": [
    "The",
    " number",
    " of",
    " parameters",
    " in",
    " a",
    " neural",
    " network",
    " can",
    " impact",
    " its",
    " speed",
    ",",
    " but",
    " it",
    "'s",
    " not",
    " the",
    " only",
    " factor",
    ".\n\n",
    "In",
    " general",
    ",",
    " a",
    " larger",
    " number",
    " of",
    " parameters",
    " can",
    " lead",
    " to",
    ":\n\n",
    "1",
    ".",
    " Increased",
    " computational",
    " complexity",
    ":",
    " More",
    " parameters",
    " mean",
    " more",
    " calculations",
    " are",
    " required",
    " to",
    " process",
    " the",
    " data",
    ".\n",
    "2",
    ".",
    " Increased",
    " memory",
    " requirements",
    ":",
    " Larger",
    " models",
    " require",
    " more",
    " memory",
    " to",
    " store",
    " their",
    " parameters",
    ",",
    " which",
    " can",
    " impact",
    " system",
    " performance",
    ".\n\n",
    "However",
    ",",
    " it",
    "'s",
    " worth",
    " noting",
    " that",
    " the",
    " relationship",
    " between",
    " the",
    " number",
    " of",
    " parameters",
    " and",
    " speed",
    " is",
    " not",
    " always",
    " linear",
    ".",
    " Other",
    " factors",
    ",",
    " such",
    " as",
    ":\n\n",
    "*",
    " Model",
    " architecture",
    "\n",
    "*",
    " Optim",
    "izer",
    " choice",
    "\n",
    "*",
    " Hyper",
    "parameter",
    " tuning",
    "\n\n",
    "can",
    " also",
    " impact",
    " the",
    " speed",
    " of",
    " a",
    " neural",
    " network",
    ".\n\n",
    "In",
    " the",
    " case",
    " of",
    " Johnny",
    " and",
    " Tommy",
    ",",
    " it",
    "'s",
    " difficult",
    " to",
    " say",
    " which",
    " one",
    "'s",
    " model",
    " will",
    " be",
    " faster",
    " without",
    " more",
    " information",
    " about",
    " the",
    " models",
    " themselves",
    "."
  ],
  "data_removed": false,
  "error": null,
  "source": "web",
  "status": "succeeded",
  "created_at": "2024-05-03T13:45:13.788Z",
  "started_at": "2024-05-03T13:45:13.792612Z",
  "completed_at": "2024-05-03T13:45:15.445073Z",
  "urls": {
    "cancel": "https://api.replicate.com/v1/predictions/855g9wxd7hrgp0cf7tsv1ewzgc/cancel",
    "get": "https://api.replicate.com/v1/predictions/855g9wxd7hrgp0cf7tsv1ewzgc",
    "stream": "https://streaming-api.svc.us.c.replicate.net/v1/streams/hscsfwedhigbbnorfpq7c4i3lbac5srhaqgvb4b5m2iuof3rotwq",
    "web": "https://replicate.com/p/855g9wxd7hrgp0cf7tsv1ewzgc"
  },
  "metrics": {
    "input_token_count": 39,
    "output_token_count": 149,
    "predict_time": 1.652461,
    "time_to_first_token": 0.060728942999999994,
    "tokens_per_second": 92.80206135476371,
    "total_time": 1.657073
  }
}

Generated in

1.7 second

Input tokens

Output tokens

140

Tokens per second

92.80 tokens / second

Time to first token

61 milliseconds

Tweak it Iterate in playground Report