nateraw / llama-2-70b-chat-awq

llama-2-70b-chat quantized with AWQ and served with vLLM

  • Public
  • 77 runs
  • L40S
  • GitHub
  • Paper
  • License

Run nateraw/llama-2-70b-chat-awq with an API

Input schema

top_kinteger

The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).

Default
50
top_pnumber

A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).

Default
0.95
messagestring

temperaturenumber

The value used to modulate the next token probabilities.

Default
0.8
max_new_tokensinteger

The maximum number of tokens the model should generate as output.

Default
512
presence_penaltynumber

Presence penalty

Default
1

Output schema

Type
string