Readme
This model doesn't have a readme.
Run this model in Node.js with one line of code:
npm install replicate
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
import Replicate from "replicate";
const replicate = new Replicate({
auth: process.env.REPLICATE_API_TOKEN,
});
Run moinnadeem/vllm-engine-llama-7b using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run(
"moinnadeem/vllm-engine-llama-7b:049b7726f19e862f1654523a3e8567931124effd98db7f1854bf517deeac3033",
{
input: {
debug: false,
top_k: 50,
top_p: 0.9,
prompt: "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. \n\nYou must output the SQL query that answers the question.\n\n### Input:\nWhat is the total number of decile for the redwood school locality?\n\n### Context:\nCREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)\n\n### Response:\n",
temperature: 0.75,
max_new_tokens: 128,
min_new_tokens: -1
}
}
);
console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
pip install replicate
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
import replicate
Run moinnadeem/vllm-engine-llama-7b using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run(
"moinnadeem/vllm-engine-llama-7b:049b7726f19e862f1654523a3e8567931124effd98db7f1854bf517deeac3033",
input={
"debug": False,
"top_k": 50,
"top_p": 0.9,
"prompt": "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. \n\nYou must output the SQL query that answers the question.\n\n### Input:\nWhat is the total number of decile for the redwood school locality?\n\n### Context:\nCREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)\n\n### Response:\n",
"temperature": 0.75,
"max_new_tokens": 128,
"min_new_tokens": -1
}
)
# The moinnadeem/vllm-engine-llama-7b model can stream output as it's running.
# The predict method returns an iterator, and you can iterate over that output.
for item in output:
# https://replicate.com/moinnadeem/vllm-engine-llama-7b/api#output-schema
print(item, end="")
To learn more, take a look at the guide on getting started with Python.
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Run moinnadeem/vllm-engine-llama-7b using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-H "Prefer: wait" \
-d $'{
"version": "moinnadeem/vllm-engine-llama-7b:049b7726f19e862f1654523a3e8567931124effd98db7f1854bf517deeac3033",
"input": {
"debug": false,
"top_k": 50,
"top_p": 0.9,
"prompt": "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. \\n\\nYou must output the SQL query that answers the question.\\n\\n### Input:\\nWhat is the total number of decile for the redwood school locality?\\n\\n### Context:\\nCREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)\\n\\n### Response:\\n",
"temperature": 0.75,
"max_new_tokens": 128,
"min_new_tokens": -1
}
}' \
https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Add a payment method to run this model.
By signing in, you agree to our
terms of service and privacy policy
{
"completed_at": "2023-09-25T18:32:09.615526Z",
"created_at": "2023-09-25T18:32:05.915928Z",
"data_removed": false,
"error": null,
"id": "3mgmzkrb6shggxewdohflfh3i4",
"input": {
"debug": false,
"top_k": 50,
"top_p": 0.9,
"prompt": "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. \n\nYou must output the SQL query that answers the question.\n\n### Input:\nWhat is the total number of decile for the redwood school locality?\n\n### Context:\nCREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)\n\n### Response:\n",
"lora_path": "https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip",
"temperature": 0.75,
"max_new_tokens": 128
},
"logs": "Weights path: https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip\nDownloading peft weights\nusing https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip instead of https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip\nDownloaded sql.zip as 12 519 kB chunks in 1.0834 with 0 retries\nDownloaded peft weights in 1.084\nUnzipped peft weights in 0.083\nData keys: dict_keys(['adapter_config.json', 'adapter_model.bin'])\nPrompt:\nYou are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.\nYou must output the SQL query that answers the question.\n### Input:\nWhat is the total number of decile for the redwood school locality?\n### Context:\nCREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)\n### Response:\nINFO 09-25 18:32:09 async_llm_engine.py:371] Received request 0: prompt: 'You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. \\n\\nYou must output the SQL query that answers the question.\\n\\n### Input:\\nWhat is the total number of decile for the redwood school locality?\\n\\n### Context:\\nCREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)\\n\\n### Response:\\n', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.75, top_p=0.9, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None), prompt token ids: None.\nINFO 09-25 18:32:09 llm_engine.py:623] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%\nINFO 09-25 18:32:09 async_llm_engine.py:111] Finished request 0.\nGenerated text: SELECT COUNT(decile) FROM table_name_34 WHERE name = \"redwood school\"\nGenerated 22 tokens in 0.416 seconds (52.948 tokens per second)",
"metrics": {
"predict_time": 3.738469,
"total_time": 3.699598
},
"output": [
"SELECT",
" COUNT",
"(",
"de",
"cile",
")",
" FROM",
" table",
"_",
"name",
"_",
"3",
"4",
" WHERE",
" name",
" =",
" \"",
"red",
"wood",
" school",
"\"",
""
],
"started_at": "2023-09-25T18:32:05.877057Z",
"status": "succeeded",
"urls": {
"get": "https://api.replicate.com/v1/predictions/3mgmzkrb6shggxewdohflfh3i4",
"cancel": "https://api.replicate.com/v1/predictions/3mgmzkrb6shggxewdohflfh3i4/cancel"
},
"version": "559ff8c30789d100c13f9bd0f831210f6f9c6f1c81dab06ff16dc61dbfa94b03"
}
Weights path: https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip
Downloading peft weights
using https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip instead of https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip
Downloaded sql.zip as 12 519 kB chunks in 1.0834 with 0 retries
Downloaded peft weights in 1.084
Unzipped peft weights in 0.083
Data keys: dict_keys(['adapter_config.json', 'adapter_model.bin'])
Prompt:
You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.
You must output the SQL query that answers the question.
### Input:
What is the total number of decile for the redwood school locality?
### Context:
CREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)
### Response:
INFO 09-25 18:32:09 async_llm_engine.py:371] Received request 0: prompt: 'You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. \n\nYou must output the SQL query that answers the question.\n\n### Input:\nWhat is the total number of decile for the redwood school locality?\n\n### Context:\nCREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)\n\n### Response:\n', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.75, top_p=0.9, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None), prompt token ids: None.
INFO 09-25 18:32:09 llm_engine.py:623] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%
INFO 09-25 18:32:09 async_llm_engine.py:111] Finished request 0.
Generated text: SELECT COUNT(decile) FROM table_name_34 WHERE name = "redwood school"
Generated 22 tokens in 0.416 seconds (52.948 tokens per second)
This output was created using a different version of the model, moinnadeem/vllm-engine-llama-7b:559ff8c3.
This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.
This model doesn't have a readme.
This model is cold. You'll get a fast response if the model is warm and already running, and a slower response if the model is cold and starting up.
This model costs approximately $0.0038 to run on Replicate, but this varies depending on your inputs. View more.
Weights path: https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip
Downloading peft weights
using https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip instead of https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip
Downloaded sql.zip as 12 519 kB chunks in 1.0834 with 0 retries
Downloaded peft weights in 1.084
Unzipped peft weights in 0.083
Data keys: dict_keys(['adapter_config.json', 'adapter_model.bin'])
Prompt:
You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.
You must output the SQL query that answers the question.
### Input:
What is the total number of decile for the redwood school locality?
### Context:
CREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)
### Response:
INFO 09-25 18:32:09 async_llm_engine.py:371] Received request 0: prompt: 'You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. \n\nYou must output the SQL query that answers the question.\n\n### Input:\nWhat is the total number of decile for the redwood school locality?\n\n### Context:\nCREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)\n\n### Response:\n', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.75, top_p=0.9, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None), prompt token ids: None.
INFO 09-25 18:32:09 llm_engine.py:623] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%
INFO 09-25 18:32:09 async_llm_engine.py:111] Finished request 0.
Generated text: SELECT COUNT(decile) FROM table_name_34 WHERE name = "redwood school"
Generated 22 tokens in 0.416 seconds (52.948 tokens per second)