technillogue/llama-2-7b-mlc:e2e8d428 | Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
prompt	string		Prompt to send to the model.
max_new_tokens	integer	128 Min: 1.0	Maximum number of tokens to generate. A word is generally 2-3 tokens
min_new_tokens	integer	-1 Min: -1.0	Minimum number of tokens to generate. To disable, set to -1. A word is generally 2-3 tokens.
temperature	number	0.7 Min: 0.01 Max: 5.0	Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value.
top_p	number	0.95 Max: 1.0	When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens
repetition_penalty	number	1.15	A parameter that controls how repetitive text can be. Lower means more repetitive, while higher means less repetitive. Set to 1.0 to disable.
stop_sequences	string		A comma-separated list of sequences to stop generation at. For example, '<end>,<stop>' will stop generation at the first instance of 'end' or '<stop>'.
seed	integer		Random seed. Leave blank to randomize the seed
debug	boolean	False	provide debugging output in logs
return_logits	boolean	False	if set, only return logits for the first token. only useful for testing, etc.
replicate_weights	string		Path to fine-tuned weights produced by a Replicate fine-tune job.

The shape of the response you’ll get when you run this model with an API.

Schema

{'items': {'type': 'string'},
 'title': 'Output',
 'type': 'array',
 'x-cog-array-display': 'concatenate',
 'x-cog-array-type': 'iterator'}