You're looking at a specific version of this model. Jump to the model overview.

shreejalmaharjan-27 /tiktok-short-captions:681cd564

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
video
string
Video Path
caption_size
integer
30
The maximum number of words to generate in each window
model
None
large-v3
Whisper model size (currently only large-v3 is supported).
language
None
auto
Language spoken in the audio, specify 'auto' for automatic language detection
temperature
number
0
temperature to use for sampling
patience
number
optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search
suppress_tokens
string
-1
comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations
initial_prompt
string
optional text to provide as a prompt for the first window.
condition_on_previous_text
boolean
True
if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop
temperature_increment_on_fallback
number
0.2
temperature to increase when falling back when the decoding fails to meet either of the thresholds below
compression_ratio_threshold
number
2.4
if the gzip compression ratio is higher than this value, treat the decoding as failed
logprob_threshold
number
-1
if the average log probability is lower than this value, treat the decoding as failed
no_speech_threshold
number
0.6
if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}