nateraw
/
llama-2-70b-chat-awq
llama-2-70b-chat quantized with AWQ and served with vLLM
Run nateraw/llama-2-70b-chat-awq with an API
Input schema
The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).
- Default
- 50
A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).
- Default
- 0.95
The value used to modulate the next token probabilities.
- Default
- 0.8
The maximum number of tokens the model should generate as output.
- Default
- 512
Presence penalty
- Default
- 1
Output schema
- Type
- string