nateraw / llama-2-7b-samsum
- Public
- 36 runs
-
L40S
Prediction
nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397ID4yew3gdbntrgltnwgtj36eg22yStatusSucceededSourceWebHardwareA40 (Large)Total durationCreatedInput
- debug
- top_p
- 0.95
- prompt
- [INST] <<SYS>> Use the Input to provide a summary of a conversation. <</SYS>> Input: Gary: Hey, don't forget about Tom's bday party! Lara: I won't! What time should I show up? Gary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up. Lara: You're such a great boyfriend. He will be so happy! Gary: Yep, I am :) Lara: So I'll just pick up the cake and get the balloons... Gary: Thanks, you're so helpful. I've already paid for the cake. Lara: No problem, see you at 5 pm! Gary: See you! [/INST] Summary:
- temperature
- 0.7
- return_logits
- max_new_tokens
- 128
- min_new_tokens
- -1
- repetition_penalty
- 1.15
{ "debug": false, "top_p": 0.95, "prompt": "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ", "temperature": 0.7, "return_logits": false, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run nateraw/llama-2-7b-samsum using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397", { input: { debug: false, top_p: 0.95, prompt: "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ", temperature: 0.7, return_logits: false, max_new_tokens: 128, min_new_tokens: -1, repetition_penalty: 1.15 } } ); console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run nateraw/llama-2-7b-samsum using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397", input={ "debug": False, "top_p": 0.95, "prompt": "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ", "temperature": 0.7, "return_logits": False, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 } ) # The nateraw/llama-2-7b-samsum model can stream output as it's running. # The predict method returns an iterator, and you can iterate over that output. for item in output: # https://replicate.com/nateraw/llama-2-7b-samsum/api#output-schema print(item, end="")
To learn more, take a look at the guide on getting started with Python.
Run nateraw/llama-2-7b-samsum using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397", "input": { "debug": false, "top_p": 0.95, "prompt": "[INST] <<SYS>>\\nUse the Input to provide a summary of a conversation.\\n<</SYS>>\\n\\nInput:\\nGary: Hey, don\'t forget about Tom\'s bday party!\\nLara: I won\'t! What time should I show up?\\nGary: Around 5 pm. He\'s supposed to be back home at 5:30, so we\'ll have just enough time to prep things up.\\nLara: You\'re such a great boyfriend. He will be so happy!\\nGary: Yep, I am :)\\nLara: So I\'ll just pick up the cake and get the balloons...\\nGary: Thanks, you\'re so helpful. I\'ve already paid for the cake.\\nLara: No problem, see you at 5 pm!\\nGary: See you! [/INST]\\n\\nSummary: ", "temperature": 0.7, "return_logits": false, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
Lara will pick up the cake and balloons for Tom's bday party and meet Gary at 5 pm.{ "completed_at": "2023-11-28T07:55:55.905428Z", "created_at": "2023-11-28T07:55:48.985119Z", "data_removed": false, "error": null, "id": "4yew3gdbntrgltnwgtj36eg22y", "input": { "debug": false, "top_p": 0.95, "prompt": "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ", "temperature": 0.7, "return_logits": false, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 }, "logs": "Your formatted prompt is:\n[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\nSummary:\nprevious weights were different, switching to https://replicate.delivery/pbxt/EY7Ew28BebUf6E4e2e1wgm2SnrQiFW1hFcN6vxZ2kIB8kTzHB/training_output.zip\nDownloading peft weights\nDownloaded training_output.zip as 1 8240 kB chunks in 0.401s with 0 retries\nDownloaded peft weights in 0.620\nUnzipped peft weights in 0.003\nInitialized peft model in 0.060\nOverall initialize_peft took 1.238\nExllama: False\nINFO 11-28 07:55:55 async_llm_engine.py:371] Received request 0: prompt: \"[INST] <<SYS>>\\nUse the Input to provide a summary of a conversation.\\n<</SYS>>\\n\\nInput:\\nGary: Hey, don't forget about Tom's bday party!\\nLara: I won't! What time should I show up?\\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\\nLara: You're such a great boyfriend. He will be so happy!\\nGary: Yep, I am :)\\nLara: So I'll just pick up the cake and get the balloons...\\nGary: Thanks, you're so helpful. I've already paid for the cake.\\nLara: No problem, see you at 5 pm!\\nGary: See you! [/INST]\\n\\nSummary: \", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.7, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None.\nINFO 11-28 07:55:55 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%\nINFO 11-28 07:55:55 async_llm_engine.py:111] Finished request 0.\nhostname: model-hs-77dde5d6-e494c37392ea209c-gpu-a40-7f448dcbc8-78nv9", "metrics": { "predict_time": 1.995227, "total_time": 6.920309 }, "output": [ "\n", "L", "ara", " will", " pick", " up", " the", " c", "ake", " and", " bal", "lo", "ons", " for", " Tom", "'", "s", " b", "day", " party", " and", " meet", " Gary", " at", " ", "5", " pm", ".", "" ], "started_at": "2023-11-28T07:55:53.910201Z", "status": "succeeded", "urls": { "stream": "https://streaming-api.svc.us.c.replicate.net/v1/predictions/4yew3gdbntrgltnwgtj36eg22y", "get": "https://api.replicate.com/v1/predictions/4yew3gdbntrgltnwgtj36eg22y", "cancel": "https://api.replicate.com/v1/predictions/4yew3gdbntrgltnwgtj36eg22y/cancel" }, "version": "7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397" }
Generated inYour formatted prompt is: [INST] <<SYS>> Use the Input to provide a summary of a conversation. <</SYS>> Input: Gary: Hey, don't forget about Tom's bday party! Lara: I won't! What time should I show up? Gary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up. Lara: You're such a great boyfriend. He will be so happy! Gary: Yep, I am :) Lara: So I'll just pick up the cake and get the balloons... Gary: Thanks, you're so helpful. I've already paid for the cake. Lara: No problem, see you at 5 pm! Gary: See you! [/INST] Summary: previous weights were different, switching to https://replicate.delivery/pbxt/EY7Ew28BebUf6E4e2e1wgm2SnrQiFW1hFcN6vxZ2kIB8kTzHB/training_output.zip Downloading peft weights Downloaded training_output.zip as 1 8240 kB chunks in 0.401s with 0 retries Downloaded peft weights in 0.620 Unzipped peft weights in 0.003 Initialized peft model in 0.060 Overall initialize_peft took 1.238 Exllama: False INFO 11-28 07:55:55 async_llm_engine.py:371] Received request 0: prompt: "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.7, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None. INFO 11-28 07:55:55 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0% INFO 11-28 07:55:55 async_llm_engine.py:111] Finished request 0. hostname: model-hs-77dde5d6-e494c37392ea209c-gpu-a40-7f448dcbc8-78nv9
Prediction
nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397IDbzsonglb4jwxhr2yjswvutd43iStatusSucceededSourceWebHardwareA40 (Large)Total durationCreatedInput
- debug
- top_p
- 0.95
- prompt
- [INST] <<SYS>> Use the Input to provide a summary of a conversation. <</SYS>> Input: Gary: Hey, don't forget about Tom's bday party! Lara: I won't! What time should I show up? Gary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up. Lara: You're such a great boyfriend. He will be so happy! Gary: Yep, I am :) Lara: So I'll just pick up the cake and get the balloons... Gary: Thanks, you're so helpful. I've already paid for the cake. Lara: No problem, see you at 5 pm! Gary: See you! [/INST] Summary:
- temperature
- 0.3
- return_logits
- max_new_tokens
- 128
- min_new_tokens
- -1
- repetition_penalty
- 1.15
{ "debug": false, "top_p": 0.95, "prompt": "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ", "temperature": 0.3, "return_logits": false, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run nateraw/llama-2-7b-samsum using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397", { input: { debug: false, top_p: 0.95, prompt: "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ", temperature: 0.3, return_logits: false, max_new_tokens: 128, min_new_tokens: -1, repetition_penalty: 1.15 } } ); console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run nateraw/llama-2-7b-samsum using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397", input={ "debug": False, "top_p": 0.95, "prompt": "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ", "temperature": 0.3, "return_logits": False, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 } ) # The nateraw/llama-2-7b-samsum model can stream output as it's running. # The predict method returns an iterator, and you can iterate over that output. for item in output: # https://replicate.com/nateraw/llama-2-7b-samsum/api#output-schema print(item, end="")
To learn more, take a look at the guide on getting started with Python.
Run nateraw/llama-2-7b-samsum using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "nateraw/llama-2-7b-samsum:7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397", "input": { "debug": false, "top_p": 0.95, "prompt": "[INST] <<SYS>>\\nUse the Input to provide a summary of a conversation.\\n<</SYS>>\\n\\nInput:\\nGary: Hey, don\'t forget about Tom\'s bday party!\\nLara: I won\'t! What time should I show up?\\nGary: Around 5 pm. He\'s supposed to be back home at 5:30, so we\'ll have just enough time to prep things up.\\nLara: You\'re such a great boyfriend. He will be so happy!\\nGary: Yep, I am :)\\nLara: So I\'ll just pick up the cake and get the balloons...\\nGary: Thanks, you\'re so helpful. I\'ve already paid for the cake.\\nLara: No problem, see you at 5 pm!\\nGary: See you! [/INST]\\n\\nSummary: ", "temperature": 0.3, "return_logits": false, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
Gary and Lara are going to Tom's bday party. They will arrive at 5 pm. Gary has already paid for the cake and Lara will pick it up and get the balloons.{ "completed_at": "2023-11-28T07:57:51.619603Z", "created_at": "2023-11-28T07:57:47.266607Z", "data_removed": false, "error": null, "id": "bzsonglb4jwxhr2yjswvutd43i", "input": { "debug": false, "top_p": 0.95, "prompt": "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ", "temperature": 0.3, "return_logits": false, "max_new_tokens": 128, "min_new_tokens": -1, "repetition_penalty": 1.15 }, "logs": "Your formatted prompt is:\n[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\nSummary:\ncorrect lora is already loaded\nOverall initialize_peft took 0.000\nExllama: False\nINFO 11-28 07:57:50 async_llm_engine.py:371] Received request 0: prompt: \"[INST] <<SYS>>\\nUse the Input to provide a summary of a conversation.\\n<</SYS>>\\n\\nInput:\\nGary: Hey, don't forget about Tom's bday party!\\nLara: I won't! What time should I show up?\\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\\nLara: You're such a great boyfriend. He will be so happy!\\nGary: Yep, I am :)\\nLara: So I'll just pick up the cake and get the balloons...\\nGary: Thanks, you're so helpful. I've already paid for the cake.\\nLara: No problem, see you at 5 pm!\\nGary: See you! [/INST]\\n\\nSummary: \", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.3, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None.\nINFO 11-28 07:57:50 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%\nINFO 11-28 07:57:51 async_llm_engine.py:111] Finished request 0.\nhostname: model-hs-77dde5d6-e494c37392ea209c-gpu-a40-568d84f685-pjznp", "metrics": { "predict_time": 1.17975, "total_time": 4.352996 }, "output": [ " Gary", " and", " L", "ara", " are", " going", " to", " Tom", "'", "s", " b", "day", " party", ".", " They", " will", " arrive", " at", " ", "5", " pm", ".", " Gary", " has", " already", " paid", " for", " the", " c", "ake", " and", " L", "ara", " will", " pick", " it", " up", " and", " get", " the", " bal", "lo", "ons", ".", "" ], "started_at": "2023-11-28T07:57:50.439853Z", "status": "succeeded", "urls": { "stream": "https://streaming-api.svc.us.c.replicate.net/v1/predictions/bzsonglb4jwxhr2yjswvutd43i", "get": "https://api.replicate.com/v1/predictions/bzsonglb4jwxhr2yjswvutd43i", "cancel": "https://api.replicate.com/v1/predictions/bzsonglb4jwxhr2yjswvutd43i/cancel" }, "version": "7b38898d18f1ce5a1c51d0433e14542cf771cde1cbca4fcb68061a41c6723397" }
Generated inYour formatted prompt is: [INST] <<SYS>> Use the Input to provide a summary of a conversation. <</SYS>> Input: Gary: Hey, don't forget about Tom's bday party! Lara: I won't! What time should I show up? Gary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up. Lara: You're such a great boyfriend. He will be so happy! Gary: Yep, I am :) Lara: So I'll just pick up the cake and get the balloons... Gary: Thanks, you're so helpful. I've already paid for the cake. Lara: No problem, see you at 5 pm! Gary: See you! [/INST] Summary: correct lora is already loaded Overall initialize_peft took 0.000 Exllama: False INFO 11-28 07:57:50 async_llm_engine.py:371] Received request 0: prompt: "[INST] <<SYS>>\nUse the Input to provide a summary of a conversation.\n<</SYS>>\n\nInput:\nGary: Hey, don't forget about Tom's bday party!\nLara: I won't! What time should I show up?\nGary: Around 5 pm. He's supposed to be back home at 5:30, so we'll have just enough time to prep things up.\nLara: You're such a great boyfriend. He will be so happy!\nGary: Yep, I am :)\nLara: So I'll just pick up the cake and get the balloons...\nGary: Thanks, you're so helpful. I've already paid for the cake.\nLara: No problem, see you at 5 pm!\nGary: See you! [/INST]\n\nSummary: ", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.3, top_p=0.95, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None, skip_special_tokens=True), prompt token ids: None. INFO 11-28 07:57:50 llm_engine.py:631] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0% INFO 11-28 07:57:51 async_llm_engine.py:111] Finished request 0. hostname: model-hs-77dde5d6-e494c37392ea209c-gpu-a40-568d84f685-pjznp
Want to make some of these yourself?
Run this model