You're looking at a specific version of this model. Jump to the model overview.

thomasmol /whisper-diarization:3ff22700

Input

string
Shift + Return to add a new line

Either provide: Base64 encoded audio file,

string
Shift + Return to add a new line

Or provide: A direct audio file URL

file

Or an audio file

boolean

Group segments of same speaker shorter apart than 2 seconds

Default: true

string

Specify the format of the transcript output: individual words with timestamps, full text of segments, or a combination of both.

Default: "both"

integer
(minimum: 1, maximum: 50)

Number of speakers, leave empty to autodetect.

string
Shift + Return to add a new line

Language of the spoken words as a language code like 'en'. Leave empty to auto detect language.

string
Shift + Return to add a new line

Vocabulary: provide names, acronyms and loanwords in a list. Use punctuation for best accuracy.

integer
(minimum: 0)

Offset in seconds, used for chunked inputs

Default: 0

Output

No output yet! Press "Submit" to start a prediction.