nateraw / llama-2-7b-paraphrase-v1
- Public
- 113 runs
-
L40S
Prediction
nateraw/llama-2-7b-paraphrase-v1:17b76fbd699fcd4476b6d7292de8bfd1ee1b219f7ed81f0395da0631d00850ccIDuuys223ba4qjmni5ejfz6u5fjqStatusSucceededSourceWebHardwareA40 (Large)Total durationCreatedInput
- debug
- top_k
- 50
- top_p
- 0.9
- prompt
- The capital of France is Paris
- temperature
- 0.75
- max_new_tokens
- 128
- min_new_tokens
- -1
{ "debug": false, "top_k": 50, "top_p": 0.9, "prompt": "The capital of France is Paris\n", "temperature": 0.75, "max_new_tokens": 128, "min_new_tokens": -1 }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run nateraw/llama-2-7b-paraphrase-v1 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "nateraw/llama-2-7b-paraphrase-v1:17b76fbd699fcd4476b6d7292de8bfd1ee1b219f7ed81f0395da0631d00850cc", { input: { debug: false, top_p: 0.9, prompt: "The capital of France is Paris\n", temperature: 0.75, max_new_tokens: 128, min_new_tokens: -1 } } ); console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run nateraw/llama-2-7b-paraphrase-v1 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "nateraw/llama-2-7b-paraphrase-v1:17b76fbd699fcd4476b6d7292de8bfd1ee1b219f7ed81f0395da0631d00850cc", input={ "debug": False, "top_p": 0.9, "prompt": "The capital of France is Paris\n", "temperature": 0.75, "max_new_tokens": 128, "min_new_tokens": -1 } ) # The nateraw/llama-2-7b-paraphrase-v1 model can stream output as it's running. # The predict method returns an iterator, and you can iterate over that output. for item in output: # https://replicate.com/nateraw/llama-2-7b-paraphrase-v1/api#output-schema print(item, end="")
To learn more, take a look at the guide on getting started with Python.
Run nateraw/llama-2-7b-paraphrase-v1 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "nateraw/llama-2-7b-paraphrase-v1:17b76fbd699fcd4476b6d7292de8bfd1ee1b219f7ed81f0395da0631d00850cc", "input": { "debug": false, "top_p": 0.9, "prompt": "The capital of France is Paris\\n", "temperature": 0.75, "max_new_tokens": 128, "min_new_tokens": -1 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
The capital of France is located in Paris.{ "completed_at": "2023-11-14T23:28:48.136324Z", "created_at": "2023-11-14T23:28:35.938654Z", "data_removed": false, "error": null, "id": "uuys223ba4qjmni5ejfz6u5fjq", "input": { "debug": false, "top_k": 50, "top_p": 0.9, "prompt": "The capital of France is Paris\n", "temperature": 0.75, "max_new_tokens": 128, "min_new_tokens": -1 }, "logs": "Your formatted prompt is:\nThe capital of France is Paris\nprevious weights were different, switching to https://replicate.delivery/pbxt/TVlCb4hSte1nUqEMOrIjrcnitL1WlK3l5vOzb99Voz2fOb4RA/training_output.zip\nDownloading peft weights\nusing https://replicate.delivery/pbxt/TVlCb4hSte1nUqEMOrIjrcnitL1WlK3l5vOzb99Voz2fOb4RA/training_output.zip instead of https://replicate.delivery/pbxt/TVlCb4hSte1nUqEMOrIjrcnitL1WlK3l5vOzb99Voz2fOb4RA/training_output.zip\nDownloaded training_output.zip as 10 824 kB chunks in 0.3440 with 0 retries\nDownloaded peft weights in 0.344\nUnzipped peft weights in 0.010\nInitialized peft model in 0.006\nOverall initialize_peft took 9.265\nExllama: False\nINFO 11-14 23:28:47 async_llm_engine.py:371] Received request 0: prompt: 'The capital of France is Paris\\n', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.75, top_p=0.9, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None.\nINFO 11-14 23:28:47 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%\nINFO 11-14 23:28:48 async_llm_engine.py:111] Finished request 0.\nhostname: model-hs-73001d65-da72d39bf79629ac-gpu-a40-85ddd9ccc9-97bzw", "metrics": { "predict_time": 9.586471, "total_time": 12.19767 }, "output": [ "The", " capital", " of", " France", " is", " located", " in", " Paris", ".", "" ], "started_at": "2023-11-14T23:28:38.549853Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/uuys223ba4qjmni5ejfz6u5fjq", "cancel": "https://api.replicate.com/v1/predictions/uuys223ba4qjmni5ejfz6u5fjq/cancel" }, "version": "17b76fbd699fcd4476b6d7292de8bfd1ee1b219f7ed81f0395da0631d00850cc" }
Generated inYour formatted prompt is: The capital of France is Paris previous weights were different, switching to https://replicate.delivery/pbxt/TVlCb4hSte1nUqEMOrIjrcnitL1WlK3l5vOzb99Voz2fOb4RA/training_output.zip Downloading peft weights using https://replicate.delivery/pbxt/TVlCb4hSte1nUqEMOrIjrcnitL1WlK3l5vOzb99Voz2fOb4RA/training_output.zip instead of https://replicate.delivery/pbxt/TVlCb4hSte1nUqEMOrIjrcnitL1WlK3l5vOzb99Voz2fOb4RA/training_output.zip Downloaded training_output.zip as 10 824 kB chunks in 0.3440 with 0 retries Downloaded peft weights in 0.344 Unzipped peft weights in 0.010 Initialized peft model in 0.006 Overall initialize_peft took 9.265 Exllama: False INFO 11-14 23:28:47 async_llm_engine.py:371] Received request 0: prompt: 'The capital of France is Paris\n', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.75, top_p=0.9, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None. INFO 11-14 23:28:47 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0% INFO 11-14 23:28:48 async_llm_engine.py:111] Finished request 0. hostname: model-hs-73001d65-da72d39bf79629ac-gpu-a40-85ddd9ccc9-97bzw
Prediction
nateraw/llama-2-7b-paraphrase-v1:17b76fbd699fcd4476b6d7292de8bfd1ee1b219f7ed81f0395da0631d00850ccIDc3qq3rdb3fpwizk6x6sfq62h4qStatusSucceededSourceWebHardwareA40 (Large)Total durationCreatedInput
- debug
- top_k
- 50
- top_p
- 0.9
- prompt
- I'm going to order some food. Do you want some?
- temperature
- 0.75
- max_new_tokens
- 128
- min_new_tokens
- -1
{ "debug": false, "top_k": 50, "top_p": 0.9, "prompt": "I'm going to order some food. Do you want some?\n", "temperature": 0.75, "max_new_tokens": 128, "min_new_tokens": -1 }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run nateraw/llama-2-7b-paraphrase-v1 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "nateraw/llama-2-7b-paraphrase-v1:17b76fbd699fcd4476b6d7292de8bfd1ee1b219f7ed81f0395da0631d00850cc", { input: { debug: false, top_p: 0.9, prompt: "I'm going to order some food. Do you want some?\n", temperature: 0.75, max_new_tokens: 128, min_new_tokens: -1 } } ); console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run nateraw/llama-2-7b-paraphrase-v1 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "nateraw/llama-2-7b-paraphrase-v1:17b76fbd699fcd4476b6d7292de8bfd1ee1b219f7ed81f0395da0631d00850cc", input={ "debug": False, "top_p": 0.9, "prompt": "I'm going to order some food. Do you want some?\n", "temperature": 0.75, "max_new_tokens": 128, "min_new_tokens": -1 } ) # The nateraw/llama-2-7b-paraphrase-v1 model can stream output as it's running. # The predict method returns an iterator, and you can iterate over that output. for item in output: # https://replicate.com/nateraw/llama-2-7b-paraphrase-v1/api#output-schema print(item, end="")
To learn more, take a look at the guide on getting started with Python.
Run nateraw/llama-2-7b-paraphrase-v1 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "nateraw/llama-2-7b-paraphrase-v1:17b76fbd699fcd4476b6d7292de8bfd1ee1b219f7ed81f0395da0631d00850cc", "input": { "debug": false, "top_p": 0.9, "prompt": "I\'m going to order some food. Do you want some?\\n", "temperature": 0.75, "max_new_tokens": 128, "min_new_tokens": -1 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
I'm going to order some food. Would you like to join me?{ "completed_at": "2023-11-14T23:29:49.816625Z", "created_at": "2023-11-14T23:29:48.193505Z", "data_removed": false, "error": null, "id": "c3qq3rdb3fpwizk6x6sfq62h4q", "input": { "debug": false, "top_k": 50, "top_p": 0.9, "prompt": "I'm going to order some food. Do you want some?\n", "temperature": 0.75, "max_new_tokens": 128, "min_new_tokens": -1 }, "logs": "Your formatted prompt is:\nI'm going to order some food. Do you want some?\ncorrect lora is already loaded\nOverall initialize_peft took 0.000\nExllama: False\nINFO 11-14 23:29:49 async_llm_engine.py:371] Received request 0: prompt: \"I'm going to order some food. Do you want some?\\n\", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.75, top_p=0.9, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None.\nINFO 11-14 23:29:49 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%\nINFO 11-14 23:29:49 async_llm_engine.py:111] Finished request 0.\nhostname: model-hs-73001d65-da72d39bf79629ac-gpu-a40-85ddd9ccc9-97bzw", "metrics": { "predict_time": 0.414705, "total_time": 1.62312 }, "output": [ "I", "'", "m", " going", " to", " order", " some", " food", ".", " Would", " you", " like", " to", " join", " me", "?", "" ], "started_at": "2023-11-14T23:29:49.401920Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/c3qq3rdb3fpwizk6x6sfq62h4q", "cancel": "https://api.replicate.com/v1/predictions/c3qq3rdb3fpwizk6x6sfq62h4q/cancel" }, "version": "17b76fbd699fcd4476b6d7292de8bfd1ee1b219f7ed81f0395da0631d00850cc" }
Generated inYour formatted prompt is: I'm going to order some food. Do you want some? correct lora is already loaded Overall initialize_peft took 0.000 Exllama: False INFO 11-14 23:29:49 async_llm_engine.py:371] Received request 0: prompt: "I'm going to order some food. Do you want some?\n", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.75, top_p=0.9, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None. INFO 11-14 23:29:49 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0% INFO 11-14 23:29:49 async_llm_engine.py:111] Finished request 0. hostname: model-hs-73001d65-da72d39bf79629ac-gpu-a40-85ddd9ccc9-97bzw
Prediction
nateraw/llama-2-7b-paraphrase-v1:17b76fbd699fcd4476b6d7292de8bfd1ee1b219f7ed81f0395da0631d00850ccIDullqrudbck3zk5l5fsuu7ayliqStatusSucceededSourceWebHardwareA40 (Large)Total durationCreatedInput
- debug
- top_k
- 50
- top_p
- 0.9
- prompt
- My favorite color is red, but I also like black.
- temperature
- 0.75
- max_new_tokens
- 128
- min_new_tokens
- -1
{ "debug": false, "top_k": 50, "top_p": 0.9, "prompt": "My favorite color is red, but I also like black.\n", "temperature": 0.75, "max_new_tokens": 128, "min_new_tokens": -1 }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run nateraw/llama-2-7b-paraphrase-v1 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "nateraw/llama-2-7b-paraphrase-v1:17b76fbd699fcd4476b6d7292de8bfd1ee1b219f7ed81f0395da0631d00850cc", { input: { debug: false, top_p: 0.9, prompt: "My favorite color is red, but I also like black.\n", temperature: 0.75, max_new_tokens: 128, min_new_tokens: -1 } } ); console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run nateraw/llama-2-7b-paraphrase-v1 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "nateraw/llama-2-7b-paraphrase-v1:17b76fbd699fcd4476b6d7292de8bfd1ee1b219f7ed81f0395da0631d00850cc", input={ "debug": False, "top_p": 0.9, "prompt": "My favorite color is red, but I also like black.\n", "temperature": 0.75, "max_new_tokens": 128, "min_new_tokens": -1 } ) # The nateraw/llama-2-7b-paraphrase-v1 model can stream output as it's running. # The predict method returns an iterator, and you can iterate over that output. for item in output: # https://replicate.com/nateraw/llama-2-7b-paraphrase-v1/api#output-schema print(item, end="")
To learn more, take a look at the guide on getting started with Python.
Run nateraw/llama-2-7b-paraphrase-v1 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "nateraw/llama-2-7b-paraphrase-v1:17b76fbd699fcd4476b6d7292de8bfd1ee1b219f7ed81f0395da0631d00850cc", "input": { "debug": false, "top_p": 0.9, "prompt": "My favorite color is red, but I also like black.\\n", "temperature": 0.75, "max_new_tokens": 128, "min_new_tokens": -1 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
Red is my favorite color, but I also have a soft spot for black.{ "completed_at": "2023-11-14T23:30:18.379928Z", "created_at": "2023-11-14T23:30:09.612573Z", "data_removed": false, "error": null, "id": "ullqrudbck3zk5l5fsuu7ayliq", "input": { "debug": false, "top_k": 50, "top_p": 0.9, "prompt": "My favorite color is red, but I also like black.\n", "temperature": 0.75, "max_new_tokens": 128, "min_new_tokens": -1 }, "logs": "Your formatted prompt is:\nMy favorite color is red, but I also like black.\ncorrect lora is already loaded\nOverall initialize_peft took 0.000\nExllama: False\nINFO 11-14 23:30:17 async_llm_engine.py:371] Received request 0: prompt: 'My favorite color is red, but I also like black.\\n', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.75, top_p=0.9, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None.\nINFO 11-14 23:30:17 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%\nINFO 11-14 23:30:18 async_llm_engine.py:111] Finished request 0.\nhostname: model-hs-73001d65-da72d39bf79629ac-gpu-a40-85ddd9ccc9-97bzw", "metrics": { "predict_time": 0.414644, "total_time": 8.767355 }, "output": [ "Red", " is", " my", " favorite", " color", ",", " but", " I", " also", " have", " a", " soft", " spot", " for", " black", ".", "" ], "started_at": "2023-11-14T23:30:17.965284Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/ullqrudbck3zk5l5fsuu7ayliq", "cancel": "https://api.replicate.com/v1/predictions/ullqrudbck3zk5l5fsuu7ayliq/cancel" }, "version": "17b76fbd699fcd4476b6d7292de8bfd1ee1b219f7ed81f0395da0631d00850cc" }
Generated inYour formatted prompt is: My favorite color is red, but I also like black. correct lora is already loaded Overall initialize_peft took 0.000 Exllama: False INFO 11-14 23:30:17 async_llm_engine.py:371] Received request 0: prompt: 'My favorite color is red, but I also like black.\n', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.75, top_p=0.9, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None. INFO 11-14 23:30:17 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0% INFO 11-14 23:30:18 async_llm_engine.py:111] Finished request 0. hostname: model-hs-73001d65-da72d39bf79629ac-gpu-a40-85ddd9ccc9-97bzw
Want to make some of these yourself?
Run this model