typetext
{
"length_penalty": 1,
"max_new_tokens": 512,
"presence_penalty": 0,
"prompt": "Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?",
"prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"stop_sequences": "<|end_of_text|>,<|eot_id|>",
"system_prompt": "You are a helpful assistant",
"temperature": 0.7,
"top_p": 0.95
}npm install replicate
REPLICATE_API_TOKEN environment variable:export REPLICATE_API_TOKEN=r8_Doi**********************************
This is your API token. Keep it to yourself.
import Replicate from "replicate";
const replicate = new Replicate({
auth: process.env.REPLICATE_API_TOKEN,
});
Run meta/meta-llama-3-8b-instruct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const input = {
length_penalty: 1,
max_new_tokens: 512,
presence_penalty: 0,
prompt: "Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?",
prompt_template: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
stop_sequences: "<|end_of_text|>,<|eot_id|>",
system_prompt: "You are a helpful assistant",
temperature: 0.7,
top_p: 0.95
};
for await (const event of replicate.stream("meta/meta-llama-3-8b-instruct", { input })) {
process.stdout.write(event.toString());
};
To learn more, take a look at the guide on getting started with Node.js.
pip install replicate
REPLICATE_API_TOKEN environment variable:export REPLICATE_API_TOKEN=r8_Doi**********************************
This is your API token. Keep it to yourself.
import replicate
Run meta/meta-llama-3-8b-instruct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
# The meta/meta-llama-3-8b-instruct model can stream output as it's running.
for event in replicate.stream(
"meta/meta-llama-3-8b-instruct",
input={
"length_penalty": 1,
"max_new_tokens": 512,
"presence_penalty": 0,
"prompt": "Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?",
"prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"stop_sequences": "<|end_of_text|>,<|eot_id|>",
"system_prompt": "You are a helpful assistant",
"temperature": 0.7,
"top_p": 0.95
},
):
print(str(event), end="")
To learn more, take a look at the guide on getting started with Python.
REPLICATE_API_TOKEN environment variable:export REPLICATE_API_TOKEN=r8_Doi**********************************
This is your API token. Keep it to yourself.
Run meta/meta-llama-3-8b-instruct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-H "Prefer: wait" \
-d $'{
"input": {
"length_penalty": 1,
"max_new_tokens": 512,
"presence_penalty": 0,
"prompt": "Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?",
"prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\\n\\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\\n\\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\n",
"stop_sequences": "<|end_of_text|>,<|eot_id|>",
"system_prompt": "You are a helpful assistant",
"temperature": 0.7,
"top_p": 0.95
}
}' \
https://api.replicate.com/v1/models/meta/meta-llama-3-8b-instruct/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
The number of parameters in a neural network can impact its speed, but it's not the only factor. In general, a larger number of parameters can lead to: 1. Increased computational complexity: More parameters mean more calculations are required to process the data. 2. Increased memory requirements: Larger models require more memory to store their parameters, which can impact system performance. However, it's worth noting that the relationship between the number of parameters and speed is not always linear. Other factors, such as: * Model architecture * Optimizer choice * Hyperparameter tuning can also impact the speed of a neural network. In the case of Johnny and Tommy, it's difficult to say which one's model will be faster without more information about the models themselves.
{
"id": "855g9wxd7hrgp0cf7tsv1ewzgc",
"model": "meta/meta-llama-3-8b-instruct",
"version": "hidden",
"input": {
"length_penalty": 1,
"max_new_tokens": 512,
"presence_penalty": 0,
"prompt": "Johnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?",
"prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"stop_sequences": "<|end_of_text|>,<|eot_id|>",
"system_prompt": "You are a helpful assistant",
"temperature": 0.7,
"top_p": 0.95
},
"logs": "Random seed used: `57440`\nNote: Random seed will not impact output if greedy decoding is used.\nFormatted prompt: `<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nJohnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n`Random seed used: `57440`\nNote: Random seed will not impact output if greedy decoding is used.\nFormatted prompt: `<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nJohnny has 8 billion parameters. His friend Tommy has 70 billion parameters. What does this mean when it comes to speed?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n`",
"output": [
"The",
" number",
" of",
" parameters",
" in",
" a",
" neural",
" network",
" can",
" impact",
" its",
" speed",
",",
" but",
" it",
"'s",
" not",
" the",
" only",
" factor",
".\n\n",
"In",
" general",
",",
" a",
" larger",
" number",
" of",
" parameters",
" can",
" lead",
" to",
":\n\n",
"1",
".",
" Increased",
" computational",
" complexity",
":",
" More",
" parameters",
" mean",
" more",
" calculations",
" are",
" required",
" to",
" process",
" the",
" data",
".\n",
"2",
".",
" Increased",
" memory",
" requirements",
":",
" Larger",
" models",
" require",
" more",
" memory",
" to",
" store",
" their",
" parameters",
",",
" which",
" can",
" impact",
" system",
" performance",
".\n\n",
"However",
",",
" it",
"'s",
" worth",
" noting",
" that",
" the",
" relationship",
" between",
" the",
" number",
" of",
" parameters",
" and",
" speed",
" is",
" not",
" always",
" linear",
".",
" Other",
" factors",
",",
" such",
" as",
":\n\n",
"*",
" Model",
" architecture",
"\n",
"*",
" Optim",
"izer",
" choice",
"\n",
"*",
" Hyper",
"parameter",
" tuning",
"\n\n",
"can",
" also",
" impact",
" the",
" speed",
" of",
" a",
" neural",
" network",
".\n\n",
"In",
" the",
" case",
" of",
" Johnny",
" and",
" Tommy",
",",
" it",
"'s",
" difficult",
" to",
" say",
" which",
" one",
"'s",
" model",
" will",
" be",
" faster",
" without",
" more",
" information",
" about",
" the",
" models",
" themselves",
"."
],
"data_removed": false,
"error": null,
"source": "web",
"status": "succeeded",
"created_at": "2024-05-03T13:45:13.788Z",
"started_at": "2024-05-03T13:45:13.792612Z",
"completed_at": "2024-05-03T13:45:15.445073Z",
"urls": {
"cancel": "https://api.replicate.com/v1/predictions/855g9wxd7hrgp0cf7tsv1ewzgc/cancel",
"get": "https://api.replicate.com/v1/predictions/855g9wxd7hrgp0cf7tsv1ewzgc",
"stream": "https://streaming-api.svc.us.c.replicate.net/v1/streams/hscsfwedhigbbnorfpq7c4i3lbac5srhaqgvb4b5m2iuof3rotwq",
"web": "https://replicate.com/p/855g9wxd7hrgp0cf7tsv1ewzgc"
},
"metrics": {
"input_token_count": 39,
"output_token_count": 149,
"predict_time": 1.652461,
"time_to_first_token": 0.060728942999999994,
"tokens_per_second": 92.80206135476371,
"total_time": 1.657073
}
}