villesau/whisper-timestamped:f7d5b9ce – Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
audio_file	string		Audio file to transcribe
language	string	auto	Language code (e.g., 'en') or 'auto' for auto-detect
task	string (enum)	transcribe Options: transcribe, translate	Task to perform
vad	boolean	False	Use Voice Activity Detection
detect_disfluencies	boolean	False	Detect speech disfluencies
compute_word_confidence	boolean	True	Compute word confidence scores
temperature	number	0	Temperature for sampling
best_of	integer		Number of candidates when sampling with non-zero temperature
beam_size	integer		Number of beams in beam search, only applicable when temperature is zero
patience	number		Optional patience value to use in beam decoding
length_penalty	number		Optional token length penalty coefficient (alpha) as in https://arxiv.org/abs/1609.08144
suppress_tokens	string	-1	Comma-separated list of token ids to suppress during sampling
initial_prompt	string		Optional text to provide as a prompt for the first window
condition_on_previous_text	boolean	True	Whether to condition on previous text
no_speech_threshold	number	0.6	Threshold for no speech probability
compression_ratio_threshold	number	2.4	Threshold for compression ratio
logprob_threshold	number	-1	Threshold for average log probability
verbose	boolean	False	Whether to display the text being decoded

The shape of the response you’ll get when you run this model with an API.

Schema

{'title': 'Output', 'type': 'object'}