prunaai/gemma-4-26b-a4b-fast:6992a45e | Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
message	string	Explain vLLM in one sentence	The user message to send to the model
image	string		Image file to send to the model
video	string		Video file to send to the model
video_fps	number	2 Min: 0.1 Max: 30	Frames per second to sample from the video
system_prompt	string		Optional system prompt to set the model's behavior
max_tokens	integer	2048 Min: 1 Max: 16384	Maximum number of tokens to generate
enable_thinking	boolean	False	Enable thinking mode (model reasons internally before answering)
temperature	number	0.6 Max: 2	Sampling temperature (higher = more creative, lower = more deterministic)
top_p	number	0.95 Max: 1	Nucleus sampling: only consider tokens with cumulative probability up to this value
presence_penalty	number	1 Min: 1 Max: 1.5	Presence penalty: penalize new tokens based on their presence in the text
max_visual_tokens	integer	280 Min: 70 Max: 1120	Vision token budget per image (higher = more detail, more compute). Supported: 70, 140, 280, 560, 1120

The shape of the response you’ll get when you run this model with an API.

Schema

{'items': {'type': 'string'},
 'title': 'Output',
 'type': 'array',
 'x-cog-array-display': 'concatenate',
 'x-cog-array-type': 'iterator'}