prunaai/gemma-4-26b-a4b-fast:007dac37 | Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
message	string	Explain vLLM in one sentence	The user message to send to the model
images	array	[]	Images to send to the model
video	string		Video file to send to the model
video_fps	number	2 Min: 0.1 Max: 30	Frames per second to sample from the video
system_prompt	string		Optional system prompt to set the model's behavior
max_tokens	integer	2048 Min: 1 Max: 16384	Maximum number of tokens to generate
enable_thinking	boolean	False	Enable thinking mode (model reasons internally before answering)
temperature	number	0.6 Max: 2	Sampling temperature (higher = more creative, lower = more deterministic)
top_p	number	0.95 Max: 1	Nucleus sampling: only consider tokens with cumulative probability up to this value
presence_penalty	number	1 Min: 1 Max: 1.5	Presence penalty: penalize new tokens based on their presence in the text
max_visual_tokens	integer	280 Min: 70 Max: 1120	Vision token budget per image (higher = more detail, more compute). Supported: 70, 140, 280, 560, 1120

The shape of the response you’ll get when you run this model with an API.

Schema

{'items': {'type': 'string'},
 'title': 'Output',
 'type': 'array',
 'x-cog-array-display': 'concatenate',
 'x-cog-array-type': 'iterator'}