technillogue/vllm-hf-test:2d163c88 | Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
prompt	string		Prompt
min_tokens	integer	0	The minimum number of tokens the model should generate as output.
max_tokens	integer	512	The maximum number of tokens the model should generate as output.
temperature	number	0.6	The value used to modulate the next token probabilities.
top_p	number	0.9	A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
top_k	integer	50	The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
presence_penalty	number	0	Presence penalty
frequency_penalty	number	0	Frequency penalty
stop_sequences	string		A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
prompt_template	string	<s>{prompt}	Prompt template. The string `{prompt}` will be substituted for the input prompt. If you want to generate dialog output, use this template as a starting point and construct the prompt string manually, leaving `prompt_template={prompt}`.

The shape of the response you’ll get when you run this model with an API.

Schema

{'items': {'type': 'string'},
 'title': 'Output',
 'type': 'array',
 'x-cog-array-display': 'concatenate',
 'x-cog-array-type': 'iterator'}