replicate
/
vllm
- Public
- 134 runs
Run replicate/vllm with an API
Use one of our client libraries to get started quickly. Clicking on a library will take you to the Playground tab where you can tweak different inputs, see the results, and copy the corresponding code to use in your own project.
Input schema
The fields you can use to run this model with an API. If you don't give a value for a field its default value will be used.
Field | Type | Default value | Description |
---|---|---|---|
prompt |
string
|
|
Prompt
|
system_prompt |
string
|
You are a helpful assistant.
|
System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models.
|
min_tokens |
integer
|
0
|
The minimum number of tokens the model should generate as output.
|
max_tokens |
integer
|
512
|
The maximum number of tokens the model should generate as output.
|
temperature |
number
|
0.6
|
The value used to modulate the next token probabilities.
|
top_p |
number
|
0.9
|
A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
|
top_k |
integer
|
50
|
The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
|
presence_penalty |
number
|
0
|
Presence penalty
|
frequency_penalty |
number
|
0
|
Frequency penalty
|
stop_sequences |
string
|
A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
|
|
prompt_template |
string
|
A template to format the prompt with. If not provided, the default prompt template will be used.
|
|
seed |
integer
|
Random seed. Leave blank to randomize the seed.
|
{
"type": "object",
"title": "Input",
"properties": {
"seed": {
"type": "integer",
"title": "Seed",
"x-order": 11,
"description": "Random seed. Leave blank to randomize the seed."
},
"top_k": {
"type": "integer",
"title": "Top K",
"default": 50,
"x-order": 6,
"description": "The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering)."
},
"top_p": {
"type": "number",
"title": "Top P",
"default": 0.9,
"x-order": 5,
"description": "A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)."
},
"prompt": {
"type": "string",
"title": "Prompt",
"default": "",
"x-order": 0,
"description": "Prompt"
},
"max_tokens": {
"type": "integer",
"title": "Max Tokens",
"default": 512,
"x-order": 3,
"description": "The maximum number of tokens the model should generate as output."
},
"min_tokens": {
"type": "integer",
"title": "Min Tokens",
"default": 0,
"x-order": 2,
"description": "The minimum number of tokens the model should generate as output."
},
"temperature": {
"type": "number",
"title": "Temperature",
"default": 0.6,
"x-order": 4,
"description": "The value used to modulate the next token probabilities."
},
"system_prompt": {
"type": "string",
"title": "System Prompt",
"default": "You are a helpful assistant.",
"x-order": 1,
"description": "System prompt to send to the model. This is prepended to the prompt and helps guide system behavior. Ignored for non-chat models."
},
"stop_sequences": {
"type": "string",
"title": "Stop Sequences",
"x-order": 9,
"description": "A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'."
},
"prompt_template": {
"type": "string",
"title": "Prompt Template",
"x-order": 10,
"description": "A template to format the prompt with. If not provided, the default prompt template will be used."
},
"presence_penalty": {
"type": "number",
"title": "Presence Penalty",
"default": 0,
"x-order": 7,
"description": "Presence penalty"
},
"frequency_penalty": {
"type": "number",
"title": "Frequency Penalty",
"default": 0,
"x-order": 8,
"description": "Frequency penalty"
}
}
}
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{
"type": "array",
"items": {
"type": "string"
},
"title": "Output",
"x-cog-array-type": "iterator",
"x-cog-array-display": "concatenate"
}