johnnyoshika / llama2-combine-numbers
- Public
- 16 runs
-
L40S
Prediction
johnnyoshika/llama2-combine-numbers:3d318c904899fa396a3255078da6a56c0d4f0b7837550159f196eb05932aae0aID27dqpzg7pnrgm0cfhsh9rxr30wStatusSucceededSourceWebHardwareA40 (Large)Total durationCreatedInput
- debug
- top_p
- 0.95
- prompt
- What is 10+4?
- temperature
- 0.7
- return_logits
- max_new_tokens
- 128
- min_new_tokens
- -1
- repetition_penalty
- 1.15
{ "debug": false, "top_p": 0.95, "prompt": "What is 10+4?", "temperature": 0.7, "return_logits": false, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run johnnyoshika/llama2-combine-numbers using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "johnnyoshika/llama2-combine-numbers:3d318c904899fa396a3255078da6a56c0d4f0b7837550159f196eb05932aae0a", { input: { debug: false, top_p: 0.95, prompt: "What is 10+4?", temperature: 0.7, return_logits: false, max_new_tokens: 128, min_new_tokens: -1, repetition_penalty: 1.15 } } ); console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run johnnyoshika/llama2-combine-numbers using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "johnnyoshika/llama2-combine-numbers:3d318c904899fa396a3255078da6a56c0d4f0b7837550159f196eb05932aae0a", input={ "debug": False, "top_p": 0.95, "prompt": "What is 10+4?", "temperature": 0.7, "return_logits": False, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 } ) # The johnnyoshika/llama2-combine-numbers model can stream output as it's running. # The predict method returns an iterator, and you can iterate over that output. for item in output: # https://replicate.com/johnnyoshika/llama2-combine-numbers/api#output-schema print(item, end="")
To learn more, take a look at the guide on getting started with Python.
Run johnnyoshika/llama2-combine-numbers using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "johnnyoshika/llama2-combine-numbers:3d318c904899fa396a3255078da6a56c0d4f0b7837550159f196eb05932aae0a", "input": { "debug": false, "top_p": 0.95, "prompt": "What is 10+4?", "temperature": 0.7, "return_logits": false, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
10 + 4 = 14{ "completed_at": "2024-05-19T01:05:42.738483Z", "created_at": "2024-05-19T01:05:40.277000Z", "data_removed": false, "error": null, "id": "27dqpzg7pnrgm0cfhsh9rxr30w", "input": { "debug": false, "top_p": 0.95, "prompt": "What is 10+4?", "temperature": 0.7, "return_logits": false, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 }, "logs": "Your formatted prompt is:\nWhat is 10+4?\ncorrect lora is already loaded\nOverall initialize_peft took 0.000\nExllama: False\nINFO 05-19 01:05:42 async_llm_engine.py:371] Received request 0: prompt: 'What is 10+4?', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.7, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None.\nINFO 05-19 01:05:42 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%\nINFO 05-19 01:05:42 async_llm_engine.py:111] Finished request 0.\nhostname: model-hp-77dde5d6c56598691b9008f7d123a18d-74856449d8-4m969", "metrics": { "predict_time": 0.265116, "total_time": 2.461483 }, "output": [ "\n", "1", "0", " +", " ", "4", " =", " ", "1", "4", "" ], "started_at": "2024-05-19T01:05:42.473367Z", "status": "succeeded", "urls": { "stream": "https://streaming-api.svc.us.c.replicate.net/v1/streams/cfplbm56v6eavspddix3jwkd4h5ljxeh75pdk26fqpzmh5bu36aa", "get": "https://api.replicate.com/v1/predictions/27dqpzg7pnrgm0cfhsh9rxr30w", "cancel": "https://api.replicate.com/v1/predictions/27dqpzg7pnrgm0cfhsh9rxr30w/cancel" }, "version": "3d318c904899fa396a3255078da6a56c0d4f0b7837550159f196eb05932aae0a" }
Generated inYour formatted prompt is: What is 10+4? correct lora is already loaded Overall initialize_peft took 0.000 Exllama: False INFO 05-19 01:05:42 async_llm_engine.py:371] Received request 0: prompt: 'What is 10+4?', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.7, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None. INFO 05-19 01:05:42 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0% INFO 05-19 01:05:42 async_llm_engine.py:111] Finished request 0. hostname: model-hp-77dde5d6c56598691b9008f7d123a18d-74856449d8-4m969
Prediction
johnnyoshika/llama2-combine-numbers:3d318c904899fa396a3255078da6a56c0d4f0b7837550159f196eb05932aae0aID8vvv4hchj1rgg0cfhsh9bjcf1rStatusSucceededSourceWebHardwareA40 (Large)Total durationCreatedInput
- debug
- top_p
- 0.95
- prompt
- What is 398 + 34?
- temperature
- 0.7
- return_logits
- max_new_tokens
- 128
- min_new_tokens
- -1
- repetition_penalty
- 1.15
{ "debug": false, "top_p": 0.95, "prompt": "What is 398 + 34?", "temperature": 0.7, "return_logits": false, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run johnnyoshika/llama2-combine-numbers using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "johnnyoshika/llama2-combine-numbers:3d318c904899fa396a3255078da6a56c0d4f0b7837550159f196eb05932aae0a", { input: { debug: false, top_p: 0.95, prompt: "What is 398 + 34?", temperature: 0.7, return_logits: false, max_new_tokens: 128, min_new_tokens: -1, repetition_penalty: 1.15 } } ); console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run johnnyoshika/llama2-combine-numbers using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "johnnyoshika/llama2-combine-numbers:3d318c904899fa396a3255078da6a56c0d4f0b7837550159f196eb05932aae0a", input={ "debug": False, "top_p": 0.95, "prompt": "What is 398 + 34?", "temperature": 0.7, "return_logits": False, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 } ) # The johnnyoshika/llama2-combine-numbers model can stream output as it's running. # The predict method returns an iterator, and you can iterate over that output. for item in output: # https://replicate.com/johnnyoshika/llama2-combine-numbers/api#output-schema print(item, end="")
To learn more, take a look at the guide on getting started with Python.
Run johnnyoshika/llama2-combine-numbers using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "johnnyoshika/llama2-combine-numbers:3d318c904899fa396a3255078da6a56c0d4f0b7837550159f196eb05932aae0a", "input": { "debug": false, "top_p": 0.95, "prompt": "What is 398 + 34?", "temperature": 0.7, "return_logits": false, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
39834{ "completed_at": "2024-05-19T01:06:15.766992Z", "created_at": "2024-05-19T01:06:15.568000Z", "data_removed": false, "error": null, "id": "8vvv4hchj1rgg0cfhsh9bjcf1r", "input": { "debug": false, "top_p": 0.95, "prompt": "What is 398 + 34?", "temperature": 0.7, "return_logits": false, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 }, "logs": "Your formatted prompt is:\nWhat is 398 + 34?\ncorrect lora is already loaded\nOverall initialize_peft took 0.000\nExllama: False\nINFO 05-19 01:06:15 async_llm_engine.py:371] Received request 0: prompt: 'What is 398 + 34?', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.7, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None.\nINFO 05-19 01:06:15 async_llm_engine.py:111] Finished request 0.\nhostname: model-hp-77dde5d6c56598691b9008f7d123a18d-74856449d8-4m969", "metrics": { "predict_time": 0.170462, "total_time": 0.198992 }, "output": [ "\n", "3", "9", "8", "3", "4", "" ], "started_at": "2024-05-19T01:06:15.596530Z", "status": "succeeded", "urls": { "stream": "https://streaming-api.svc.us.c.replicate.net/v1/streams/6lt2v2ieecdfmx3fueyzj3kt75ummyrbq5ukvqlb3tqr6pzbr4jq", "get": "https://api.replicate.com/v1/predictions/8vvv4hchj1rgg0cfhsh9bjcf1r", "cancel": "https://api.replicate.com/v1/predictions/8vvv4hchj1rgg0cfhsh9bjcf1r/cancel" }, "version": "3d318c904899fa396a3255078da6a56c0d4f0b7837550159f196eb05932aae0a" }
Generated inYour formatted prompt is: What is 398 + 34? correct lora is already loaded Overall initialize_peft took 0.000 Exllama: False INFO 05-19 01:06:15 async_llm_engine.py:371] Received request 0: prompt: 'What is 398 + 34?', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.7, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None. INFO 05-19 01:06:15 async_llm_engine.py:111] Finished request 0. hostname: model-hp-77dde5d6c56598691b9008f7d123a18d-74856449d8-4m969
Prediction
johnnyoshika/llama2-combine-numbers:3d318c904899fa396a3255078da6a56c0d4f0b7837550159f196eb05932aae0aIDpafsd7rxq9rgp0cfhshrgqct10StatusSucceededSourceWebHardwareA40 (Large)Total durationCreatedInput
- debug
- top_p
- 0.95
- prompt
- What is 12+98?
- temperature
- 0.7
- return_logits
- max_new_tokens
- 128
- min_new_tokens
- -1
- repetition_penalty
- 1.15
{ "debug": false, "top_p": 0.95, "prompt": "What is 12+98?", "temperature": 0.7, "return_logits": false, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run johnnyoshika/llama2-combine-numbers using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "johnnyoshika/llama2-combine-numbers:3d318c904899fa396a3255078da6a56c0d4f0b7837550159f196eb05932aae0a", { input: { debug: false, top_p: 0.95, prompt: "What is 12+98?", temperature: 0.7, return_logits: false, max_new_tokens: 128, min_new_tokens: -1, repetition_penalty: 1.15 } } ); console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run johnnyoshika/llama2-combine-numbers using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "johnnyoshika/llama2-combine-numbers:3d318c904899fa396a3255078da6a56c0d4f0b7837550159f196eb05932aae0a", input={ "debug": False, "top_p": 0.95, "prompt": "What is 12+98?", "temperature": 0.7, "return_logits": False, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 } ) # The johnnyoshika/llama2-combine-numbers model can stream output as it's running. # The predict method returns an iterator, and you can iterate over that output. for item in output: # https://replicate.com/johnnyoshika/llama2-combine-numbers/api#output-schema print(item, end="")
To learn more, take a look at the guide on getting started with Python.
Run johnnyoshika/llama2-combine-numbers using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "johnnyoshika/llama2-combine-numbers:3d318c904899fa396a3255078da6a56c0d4f0b7837550159f196eb05932aae0a", "input": { "debug": false, "top_p": 0.95, "prompt": "What is 12+98?", "temperature": 0.7, "return_logits": false, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
1298 = 12 + (98 * 1){ "completed_at": "2024-05-19T01:06:51.885516Z", "created_at": "2024-05-19T01:06:51.450000Z", "data_removed": false, "error": null, "id": "pafsd7rxq9rgp0cfhshrgqct10", "input": { "debug": false, "top_p": 0.95, "prompt": "What is 12+98?", "temperature": 0.7, "return_logits": false, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 }, "logs": "Your formatted prompt is:\nWhat is 12+98?\ncorrect lora is already loaded\nOverall initialize_peft took 0.000\nExllama: False\nINFO 05-19 01:06:51 async_llm_engine.py:371] Received request 0: prompt: 'What is 12+98?', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.7, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None.\nINFO 05-19 01:06:51 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%\nINFO 05-19 01:06:51 async_llm_engine.py:111] Finished request 0.\nhostname: model-hp-77dde5d6c56598691b9008f7d123a18d-74856449d8-4m969", "metrics": { "predict_time": 0.429176, "total_time": 0.435516 }, "output": [ "\n", "1", "2", "9", "8", " =", " ", "1", "2", " +", " (", "9", "8", " *", " ", "1", ")", "" ], "started_at": "2024-05-19T01:06:51.456340Z", "status": "succeeded", "urls": { "stream": "https://streaming-api.svc.us.c.replicate.net/v1/streams/t74g32xoqbenh7nqulppfyslblazguycdg4aek4g23axn642fvfq", "get": "https://api.replicate.com/v1/predictions/pafsd7rxq9rgp0cfhshrgqct10", "cancel": "https://api.replicate.com/v1/predictions/pafsd7rxq9rgp0cfhshrgqct10/cancel" }, "version": "3d318c904899fa396a3255078da6a56c0d4f0b7837550159f196eb05932aae0a" }
Generated inYour formatted prompt is: What is 12+98? correct lora is already loaded Overall initialize_peft took 0.000 Exllama: False INFO 05-19 01:06:51 async_llm_engine.py:371] Received request 0: prompt: 'What is 12+98?', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.7, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None. INFO 05-19 01:06:51 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0% INFO 05-19 01:06:51 async_llm_engine.py:111] Finished request 0. hostname: model-hp-77dde5d6c56598691b9008f7d123a18d-74856449d8-4m969
Want to make some of these yourself?
Run this model