meta / meta-llama-3.1-405b-instruct
Meta's flagship 405 billion parameter language model, fine-tuned for chat completions
Prediction
meta/meta-llama-3.1-405b-instructInput
Run meta/meta-llama-3.1-405b-instruct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "input": { "top_p": 0.9, "prompt": "Tina has one brother and one sister. How many sisters do Tina\'s siblings have?", "max_tokens": 1024, "min_tokens": 0, "temperature": 0.6, "system_prompt": "You are a helpful assistant.", "presence_penalty": 0, "frequency_penalty": 0 } }' \ https://api.replicate.com/v1/models/meta/meta-llama-3.1-405b-instruct/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
{ "completed_at": "2024-07-23T15:04:24.090090Z", "created_at": "2024-07-23T15:04:21.227000Z", "data_removed": false, "error": null, "id": "zjq6szssxdrm20cgw0fsntgvbg", "input": { "top_p": 0.9, "prompt": "Tina has one brother and one sister. How many sisters do Tina's siblings have?", "max_tokens": 1024, "min_tokens": 0, "temperature": 0.6, "system_prompt": "You are a helpful assistant.", "presence_penalty": 0, "frequency_penalty": 0 }, "logs": "INFO 07-23 15:04:21 async_llm_engine.py:173] Added request 3cd670c9b87b469ea9721ddfbcc6b67a.\nINFO 07-23 15:04:21 metrics.py:396] Avg prompt throughput: 7.8 tokens/s, Avg generation throughput: 17.7 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0%.\nINFO 07-23 15:04:24 async_llm_engine.py:140] Finished request 3cd670c9b87b469ea9721ddfbcc6b67a.\nGeneration took 1721746720.08sFormatted prompt: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nTina has one brother and one sister. How many sisters do Tina's siblings have?<|eot_id|><|start_header_id|>assistant<|end_header_id|>", "metrics": { "batch_size": 1.000229398266064, "total_time": 2.86309, "input_token_count": 26, "tokens_per_second": 29.74134035953616, "output_token_count": 84, "predict_time_share": 2.8200430870056152, "predict_time": 2.826425468, "time_to_first_token": 0.112068943 }, "output": [ "T", "ina", " has", " one", " brother", " and", " one", " sister", ".", " From", " the", " brother", "'s", " perspective", ",", " he", " has", " one", " sister", ",", " Tina", ",", " and", " also", " the", " other", " sister", ".", " So", ",", " Tina", "'s", " brother", " has", " ", "2", " sisters", ".\n\n", "From", " the", " sister", "'s", " perspective", ",", " she", " also", " has", " one", " sister", ",", " Tina", ".", " \n\n", "So", ",", " Tina", "'s", " siblings", " have", " a", " total", " of", " ", "2", " sisters", " (", "from", " the", " brother", "'s", " perspective", ")", " and", " ", "1", " sister", " (", "from", " the", " sister", "'s", " perspective", ").", "" ], "started_at": "2024-07-23T15:04:21.263665Z", "status": "succeeded", "urls": { "stream": "https://streaming-api.svc.sea.v.replicate.net/v1/streams/6zoqtc6qgpxby7xzauee36qv6bktfaxugelxfqs6ifgg6q4s77sa", "get": "https://api.replicate.com/v1/predictions/zjq6szssxdrm20cgw0fsntgvbg", "cancel": "https://api.replicate.com/v1/predictions/zjq6szssxdrm20cgw0fsntgvbg/cancel" }, "version": "hidden" }
Generated inInput tokens26Output tokens84Tokens per second29.74 tokens / secondTime to first tokenPrediction
meta/meta-llama-3.1-405b-instructInput
Run meta/meta-llama-3.1-405b-instruct using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "input": { "top_p": 0.9, "prompt": "Invent a stack-based Turing-complete programming language with minimal syntax, write a PLY parser for it, a language runtime, and show some examples.", "max_tokens": 1024, "min_tokens": 0, "temperature": 0.6, "system_prompt": "You are an expert programmer, talking to another expert programmer. Be terse.", "presence_penalty": 0, "frequency_penalty": 0 } }' \ https://api.replicate.com/v1/models/meta/meta-llama-3.1-405b-instruct/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
{ "completed_at": "2024-07-23T15:20:12.568195Z", "created_at": "2024-07-23T15:19:47.546000Z", "data_removed": false, "error": null, "id": "jzxx22twb9rm20cgw0przfgqzm", "input": { "top_p": 0.9, "prompt": "Invent a stack-based Turing-complete programming language with minimal syntax, write a PLY parser for it, a language runtime, and show some examples.", "max_tokens": 1024, "min_tokens": 0, "temperature": 0.6, "system_prompt": "You are an expert programmer, talking to another expert programmer. Be terse.", "presence_penalty": 0, "frequency_penalty": 0 }, "logs": "INFO 07-23 15:19:47 async_llm_engine.py:173] Added request 997e070bd8ea42f798e91bc890de6c56.\nINFO 07-23 15:19:47 metrics.py:396] Avg prompt throughput: 0.2 tokens/s, Avg generation throughput: 0.3 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0%.\nINFO 07-23 15:19:52 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 30.1 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 1.0%, CPU KV cache usage: 0.0%.\nINFO 07-23 15:19:57 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 30.1 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 1.7%, CPU KV cache usage: 0.0%.\nINFO 07-23 15:20:02 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 30.1 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 2.4%, CPU KV cache usage: 0.0%.\nINFO 07-23 15:20:07 metrics.py:396] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 30.1 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 3.1%, CPU KV cache usage: 0.0%.\nINFO 07-23 15:20:12 async_llm_engine.py:140] Finished request 997e070bd8ea42f798e91bc890de6c56.\nGeneration took 1721745335.56sFormatted prompt: <|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are an expert programmer, talking to another expert programmer. Be terse.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nInvent a stack-based Turing-complete programming language with minimal syntax, write a PLY parser for it, a language runtime, and show some examples.<|eot_id|><|start_header_id|>assistant<|end_header_id|>", "metrics": { "batch_size": 2.000051610448683, "total_time": 25.022195, "input_token_count": 45, "tokens_per_second": 30.060659635637204, "output_token_count": 751, "predict_time_share": 12.490856170654297, "predict_time": 24.985711232, "time_to_first_token": 0.112341854 }, "output": [ "**", "Language", " Definition", "**\n\n", "Name", ":", " `", "ST", "K", "`\n\n", "*", " `", "push", "`", " instruction", ":", " `", "p", "`\n", "*", " `", "pop", "`", " instruction", ":", " `", "o", "`\n", "*", " `", "add", "`", " instruction", ":", " `", "+", "`\n", "*", " `", "sub", "`", " instruction", ":", " `-", "`\n", "*", " `", "mul", "`", " instruction", ":", " `", "*", "`\n", "*", " `", "div", "`", " instruction", ":", " `/", "`\n", "*", " `", "jump", "`", " instruction", ":", " `", "j", "`\n", "*", " `", "halt", "`", " instruction", ":", " `", "h", "`\n\n", "**", "PLY", " Parser", "**\n\n", "```", "python", "\n", "import", " ply", ".lex", " as", " lex", "\n", "import", " ply", ".y", "acc", " as", " yacc", "\n\n", "#", " Lexer", "\n", "tokens", " =", " ['", "P", "USH", "',", " '", "POP", "',", " '", "ADD", "',", " '", "SUB", "',", " '", "M", "UL", "',", " '", "DIV", "',", " '", "J", "UMP", "',", " '", "HAL", "T", "',", " '", "NUMBER", "']\n\n", "t", "_PUSH", " =", " r", "'", "p", "'\n", "t", "_POP", " =", " r", "'o", "'\n", "t", "_ADD", " =", " r", "'\\", "+", "'\n", "t", "_SUB", " =", " r", "'-", "'\n", "t", "_MUL", " =", " r", "'\\", "*", "'\n", "t", "_DIV", " =", " r", "'/", "'\n", "t", "_JUMP", " =", " r", "'", "j", "'\n", "t", "_H", "ALT", " =", " r", "'h", "'\n", "t", "_NUMBER", " =", " r", "'\\", "d", "+", "'\n\n", "#", " Parser", "\n", "def", " p", "_program", "(p", "):\n", " ", " '''", "program", " :", " instruction", "_list", "'''\n", " ", " p", "[", "0", "]", " =", " p", "[", "1", "]\n\n", "def", " p", "_instruction", "_list", "(p", "):\n", " ", " '''", "instruction", "_list", " :", " instruction", " instruction", "_list", "\n", " ", " |", " instruction", "'''\n", " ", " if", " len", "(p", ")", " ==", " ", "3", ":\n", " ", " p", "[", "0", "]", " =", " [", "p", "[", "1", "]]", " +", " p", "[", "2", "]\n", " ", " else", ":\n", " ", " p", "[", "0", "]", " =", " [", "p", "[", "1", "]]\n\n", "def", " p", "_instruction", "(p", "):\n", " ", " '''", "instruction", " :", " PUSH", " NUMBER", "\n", " ", " |", " POP", "\n", " ", " |", " ADD", "\n", " ", " |", " SUB", "\n", " ", " |", " M", "UL", "\n", " ", " |", " DIV", "\n", " ", " |", " J", "UMP", " NUMBER", "\n", " ", " |", " HAL", "T", "'''\n", " ", " if", " len", "(p", ")", " ==", " ", "3", ":\n", " ", " p", "[", "0", "]", " =", " (", "p", "[", "1", "],", " int", "(p", "[", "2", "]))\n", " ", " else", ":\n", " ", " p", "[", "0", "]", " =", " (", "p", "[", "1", "],", ")\n\n", "#", " Build", " parser", "\n", "lexer", " =", " lex", ".lex", "()\n", "parser", " =", " yacc", ".y", "acc", "()\n", "``", "`\n\n", "**", "Language", " Runtime", "**\n\n", "```", "python", "\n", "class", " ST", "K", "Runtime", ":\n", " ", " def", " __", "init", "__(", "self", ",", " program", "):\n", " ", " self", ".program", " =", " program", "\n", " ", " self", ".pc", " =", " ", "0", "\n", " ", " self", ".stack", " =", " []\n\n", " ", " def", " run", "(self", "):\n", " ", " while", " self", ".pc", " <", " len", "(self", ".program", "):\n", " ", " op", ",", " *", "args", " =", " self", ".program", "[self", ".pc", "]\n", " ", " self", ".pc", " +=", " ", "1", "\n", " ", " if", " op", " ==", " '", "p", "':\n", " ", " self", ".stack", ".append", "(args", "[", "0", "])\n", " ", " elif", " op", " ==", " '", "o", "':\n", " ", " self", ".stack", ".pop", "()\n", " ", " elif", " op", " ==", " '+", "':\n", " ", " b", " =", " self", ".stack", ".pop", "()\n", " ", " a", " =", " self", ".stack", ".pop", "()\n", " ", " self", ".stack", ".append", "(a", " +", " b", ")\n", " ", " elif", " op", " ==", " '-", "':\n", " ", " b", " =", " self", ".stack", ".pop", "()\n", " ", " a", " =", " self", ".stack", ".pop", "()\n", " ", " self", ".stack", ".append", "(a", " -", " b", ")\n", " ", " elif", " op", " ==", " '*", "':\n", " ", " b", " =", " self", ".stack", ".pop", "()\n", " ", " a", " =", " self", ".stack", ".pop", "()\n", " ", " self", ".stack", ".append", "(a", " *", " b", ")\n", " ", " elif", " op", " ==", " '/", "':\n", " ", " b", " =", " self", ".stack", ".pop", "()\n", " ", " a", " =", " self", ".stack", ".pop", "()\n", " ", " self", ".stack", ".append", "(a", " //", " b", ")\n", " ", " elif", " op", " ==", " '", "j", "':\n", " ", " self", ".pc", " =", " args", "[", "0", "]\n", " ", " elif", " op", " ==", " '", "h", "':\n", " ", " break", "\n\n", " ", " def", " get", "_result", "(self", "):\n", " ", " return", " self", ".stack", "[-", "1", "]", " if", " self", ".stack", " else", " None", "\n\n", "#", " Example", " usage", "\n", "program", " =", " parser", ".parse", "('", "p", " ", "2", " p", " ", "3", " +", "')\n", "runtime", " =", " ST", "K", "Runtime", "(program", ")\n", "runtime", ".run", "()\n", "print", "(runtime", ".get", "_result", "())", " ", " #", " Output", ":", " ", "5", "\n", "``", "`\n\n", "**", "More", " Examples", "**\n\n", "*", " Fibonacci", " sequence", ":", " `", "p", " ", "0", " p", " ", "1", " p", " ", "10", " j", " ", "5", " h", " ", "5", " o", " o", " +", " p", " ", "1", " j", " ", "3", " h", " ", "3", " o", " o", " p", " ", "1", " +", " j", " ", "5", "`\n", "*", " Factor", "ial", ":", " `", "p", " ", "5", " p", " ", "1", " j", " ", "3", " h", " ", "3", " o", " o", " *", " p", " ", "1", " +", " j", " ", "3", "`", "" ], "started_at": "2024-07-23T15:19:47.582483Z", "status": "succeeded", "urls": { "stream": "https://streaming-api.svc.sea.v.replicate.net/v1/streams/oo3jqmezolvmqhrq4nxquskhrt5ob6hugjjheyrhbuldkejk3rpq", "get": "https://api.replicate.com/v1/predictions/jzxx22twb9rm20cgw0przfgqzm", "cancel": "https://api.replicate.com/v1/predictions/jzxx22twb9rm20cgw0przfgqzm/cancel" }, "version": "hidden" }
Generated inInput tokens45Output tokens750Tokens per second30.06 tokens / secondTime to first token
Want to make some of these yourself?
Run this model