Prediction

Model

moinnadeem/vllm-engine-llama-7b:559ff8c3

3mgmzkrb6shggxewdohflfh3i4

Status

Succeeded

Source

Web

Hardware

A100 (40GB)

Total duration

3.7s

Created

about 2 years ago

Webhook

–

Input

lora_path: https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip
prompt: You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. You must output the SQL query that answers the question. ### Input: What is the total number of decile for the redwood school locality? ### Context: CREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR) ### Response:
max_new_tokens: 128
temperature: 0.75
top_p: 0.9
top_k: 50
debug: false

{
  "debug": false,
  "lora_path": "https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip",
  "max_new_tokens": 128,
  "prompt": "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. \n\nYou must output the SQL query that answers the question.\n\n### Input:\nWhat is the total number of decile for the redwood school locality?\n\n### Context:\nCREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)\n\n### Response:\n",
  "temperature": 0.75,
  "top_k": 50,
  "top_p": 0.9
}

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=r8_erG**********************************

This is your API token. Keep it to yourself.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run moinnadeem/vllm-engine-llama-7b using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "moinnadeem/vllm-engine-llama-7b:559ff8c30789d100c13f9bd0f831210f6f9c6f1c81dab06ff16dc61dbfa94b03",
  {
    input: {
      debug: false,
      lora_path: "https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip",
      max_new_tokens: 128,
      prompt: "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. \n\nYou must output the SQL query that answers the question.\n\n### Input:\nWhat is the total number of decile for the redwood school locality?\n\n### Context:\nCREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)\n\n### Response:\n",
      temperature: 0.75,
      top_k: 50,
      top_p: 0.9
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=r8_erG**********************************

This is your API token. Keep it to yourself.

Import the client:

import replicate

Run moinnadeem/vllm-engine-llama-7b using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "moinnadeem/vllm-engine-llama-7b:559ff8c30789d100c13f9bd0f831210f6f9c6f1c81dab06ff16dc61dbfa94b03",
    input={
        "debug": False,
        "lora_path": "https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip",
        "max_new_tokens": 128,
        "prompt": "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. \n\nYou must output the SQL query that answers the question.\n\n### Input:\nWhat is the total number of decile for the redwood school locality?\n\n### Context:\nCREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)\n\n### Response:\n",
        "temperature": 0.75,
        "top_k": 50,
        "top_p": 0.9
    }
)

# The moinnadeem/vllm-engine-llama-7b model can stream output as it's running.
# The predict method returns an iterator, and you can iterate over that output.
for item in output:
    # https://replicate.com/moinnadeem/vllm-engine-llama-7b/api#output-schema
    print(item, end="")

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=r8_erG**********************************

This is your API token. Keep it to yourself.

Run moinnadeem/vllm-engine-llama-7b using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "moinnadeem/vllm-engine-llama-7b:559ff8c30789d100c13f9bd0f831210f6f9c6f1c81dab06ff16dc61dbfa94b03",
    "input": {
      "debug": false,
      "lora_path": "https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip",
      "max_new_tokens": 128,
      "prompt": "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. \\n\\nYou must output the SQL query that answers the question.\\n\\n### Input:\\nWhat is the total number of decile for the redwood school locality?\\n\\n### Context:\\nCREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)\\n\\n### Response:\\n",
      "temperature": 0.75,
      "top_k": 50,
      "top_p": 0.9
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

SELECT COUNT(decile) FROM table_name_34 WHERE name = "redwood school"

{
  "id": "3mgmzkrb6shggxewdohflfh3i4",
  "model": "moinnadeem/vllm-engine-llama-7b",
  "version": "559ff8c30789d100c13f9bd0f831210f6f9c6f1c81dab06ff16dc61dbfa94b03",
  "input": {
    "debug": false,
    "lora_path": "https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip",
    "max_new_tokens": 128,
    "prompt": "You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. \n\nYou must output the SQL query that answers the question.\n\n### Input:\nWhat is the total number of decile for the redwood school locality?\n\n### Context:\nCREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)\n\n### Response:\n",
    "temperature": 0.75,
    "top_k": 50,
    "top_p": 0.9
  },
  "logs": "Weights path: https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip\nDownloading peft weights\nusing https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip instead of https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip\nDownloaded sql.zip as 12 519 kB chunks in 1.0834 with 0 retries\nDownloaded peft weights in 1.084\nUnzipped peft weights in 0.083\nData keys: dict_keys(['adapter_config.json', 'adapter_model.bin'])\nPrompt:\nYou are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.\nYou must output the SQL query that answers the question.\n### Input:\nWhat is the total number of decile for the redwood school locality?\n### Context:\nCREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)\n### Response:\nINFO 09-25 18:32:09 async_llm_engine.py:371] Received request 0: prompt: 'You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. \\n\\nYou must output the SQL query that answers the question.\\n\\n### Input:\\nWhat is the total number of decile for the redwood school locality?\\n\\n### Context:\\nCREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)\\n\\n### Response:\\n', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.75, top_p=0.9, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None), prompt token ids: None.\nINFO 09-25 18:32:09 llm_engine.py:623] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%\nINFO 09-25 18:32:09 async_llm_engine.py:111] Finished request 0.\nGenerated text: SELECT COUNT(decile) FROM table_name_34 WHERE name = \"redwood school\"\nGenerated 22 tokens in 0.416 seconds (52.948 tokens per second)",
  "output": [
    "SELECT",
    " COUNT",
    "(",
    "de",
    "cile",
    ")",
    " FROM",
    " table",
    "_",
    "name",
    "_",
    "3",
    "4",
    " WHERE",
    " name",
    " =",
    " \"",
    "red",
    "wood",
    " school",
    "\"",
    ""
  ],
  "data_removed": false,
  "error": null,
  "source": "web",
  "status": "succeeded",
  "created_at": "2023-09-25T18:32:05.915928Z",
  "started_at": "2023-09-25T18:32:05.877057Z",
  "completed_at": "2023-09-25T18:32:09.615526Z",
  "urls": {
    "cancel": "https://api.replicate.com/v1/predictions/3mgmzkrb6shggxewdohflfh3i4/cancel",
    "get": "https://api.replicate.com/v1/predictions/3mgmzkrb6shggxewdohflfh3i4"
  },
  "metrics": {
    "predict_time": 3.738469,
    "total_time": 3.699598
  }
}

Generated in

3.7 seconds

Tweak it Report

Weights path: https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip Downloading peft weights using https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip instead of https://pub-df34620a84bb4c0683fae07a260df1ea.r2.dev/sql.zip Downloaded sql.zip as 12 519 kB chunks in 1.0834 with 0 retries Downloaded peft weights in 1.084 Unzipped peft weights in 0.083 Data keys: dict_keys(['adapter_config.json', 'adapter_model.bin']) Prompt: You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. You must output the SQL query that answers the question. ### Input: What is the total number of decile for the redwood school locality? ### Context: CREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR) ### Response: INFO 09-25 18:32:09 async_llm_engine.py:371] Received request 0: prompt: 'You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables. \n\nYou must output the SQL query that answers the question.\n\n### Input:\nWhat is the total number of decile for the redwood school locality?\n\n### Context:\nCREATE TABLE table_name_34 (decile VARCHAR, name VARCHAR)\n\n### Response:\n', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=1.0, temperature=0.75, top_p=0.9, top_k=50, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['</s>'], ignore_eos=False, max_tokens=128, logprobs=None), prompt token ids: None. INFO 09-25 18:32:09 llm_engine.py:623] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0% INFO 09-25 18:32:09 async_llm_engine.py:111] Finished request 0. Generated text: SELECT COUNT(decile) FROM table_name_34 WHERE name = "redwood school" Generated 22 tokens in 0.416 seconds (52.948 tokens per second)