replicate/vllm | API reference

replicate / vllm

(Updated 10 months, 2 weeks ago)

Public
163 runs

Playground API Examples Train README Versions

Run replicate/vllm with an API

Use one of our client libraries to get started quickly. Clicking on a library will take you to the Playground tab where you can tweak different inputs, see the results, and copy the corresponding code to use in your own project.

Input schema

The fields you can use to run this model with an API. If you don't give a value for a field its default value will be used.

Field	Type	Default value	Description
prompt	string		Prompt
system_prompt	string	You are a helpful assistant.	System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.
min_tokens	integer	0	The minimum number of tokens the model should generate as output.
max_tokens	integer	512	The maximum number of tokens the model should generate as output.
temperature	number	0.6	The value used to modulate the next token probabilities.
top_p	number	0.9	A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
top_k	integer	50	The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
presence_penalty	number	0	Presence penalty
frequency_penalty	number	0	Frequency penalty
stop_sequences	string		A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
prompt_template	string		A template to format the prompt with. If not provided, the default prompt template will be used.

{
  "type": "object",
  "title": "Input",
  "properties": {
    "top_k": {
      "type": "integer",
      "title": "Top K",
      "default": 50,
      "x-order": 6,
      "description": "The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering)."
    },
    "top_p": {
      "type": "number",
      "title": "Top P",
      "default": 0.9,
      "x-order": 5,
      "description": "A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)."
    },
    "prompt": {
      "type": "string",
      "title": "Prompt",
      "default": "",
      "x-order": 0,
      "description": "Prompt"
    },
    "max_tokens": {
      "type": "integer",
      "title": "Max Tokens",
      "default": 512,
      "x-order": 3,
      "description": "The maximum number of tokens the model should generate as output."
    },
    "min_tokens": {
      "type": "integer",
      "title": "Min Tokens",
      "default": 0,
      "x-order": 2,
      "description": "The minimum number of tokens the model should generate as output."
    },
    "temperature": {
      "type": "number",
      "title": "Temperature",
      "default": 0.6,
      "x-order": 4,
      "description": "The value used to modulate the next token probabilities."
    },
    "system_prompt": {
      "type": "string",
      "title": "System Prompt",
      "default": "You are a helpful assistant.",
      "x-order": 1,
      "description": "System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models."
    },
    "stop_sequences": {
      "type": "string",
      "title": "Stop Sequences",
      "x-order": 9,
      "description": "A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'."
    },
    "prompt_template": {
      "type": "string",
      "title": "Prompt Template",
      "x-order": 10,
      "description": "A template to format the prompt with. If not provided, the default prompt template will be used."
    },
    "presence_penalty": {
      "type": "number",
      "title": "Presence Penalty",
      "default": 0,
      "x-order": 7,
      "description": "Presence penalty"
    },
    "frequency_penalty": {
      "type": "number",
      "title": "Frequency Penalty",
      "default": 0,
      "x-order": 8,
      "description": "Frequency penalty"
    }
  }
}

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema

{
  "type": "array",
  "items": {
    "type": "string"
  },
  "title": "Output",
  "x-cog-array-type": "iterator",
  "x-cog-array-display": "concatenate"
}