You're looking at a specific version of this model. Jump to the model overview.
elevenlabs /scribe-v2:5cd433d1
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
| Field | Type | Default value | Description |
|---|---|---|---|
| audio |
string
|
Audio or video file to transcribe. Supports MP3, WAV, M4A, FLAC, OGG, OPUS, WebM, AAC, MP4, MOV, MKV, AVI, and more. Max 3 GB, up to 10 hours.
|
|
| language_code |
string
|
auto
|
Language of the audio as an ISO-639-1 (e.g. 'en') or ISO-639-3 (e.g. 'eng') code. Set to 'auto' to detect the language automatically. Setting a specific language can improve accuracy for noisy or unusual audio.
|
| diarize |
boolean
|
False
|
Identify and label different speakers in the audio. When enabled, each word in the output includes a 'speaker_id'. Supports up to 32 speakers.
|
| num_speakers |
integer
|
0
Max: 32 |
Maximum number of speakers expected in the audio. Helps the model with diarization. Set to 0 to let the model decide. Only used when 'diarize' is true.
|
| timestamps_granularity |
None
|
word
|
Granularity of word timestamps in the output. 'word' returns start/end times for each word, 'character' adds per-character timing, 'none' omits timestamps.
|
| tag_audio_events |
boolean
|
True
|
Tag non-speech sounds in the transcription, like (laughter), (footsteps), or (applause).
|
| no_verbatim |
boolean
|
False
|
Remove filler words ('um', 'uh'), false starts, and disfluencies from the transcript. Produces a cleaner, more readable output.
|
| keyterms |
string
|
|
Comma-separated list of words or phrases to bias transcription towards. Useful for product names, technical terms, or proper nouns. Up to 1000 terms, max 50 characters each.
|
| temperature |
number
|
-1
Min: -1 Max: 2 |
Sampling temperature. Higher values produce more diverse, less deterministic output. Set to -1 to use the model default (usually 0).
|
| seed |
integer
|
-1
Min: -1 Max: 2147483647 |
Random seed for reproducible outputs. Set to -1 to use a non-deterministic seed.
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'properties': {'duration_seconds': {'nullable': True,
'title': 'Duration Seconds',
'type': 'number'},
'language_code': {'title': 'Language Code', 'type': 'string'},
'language_probability': {'title': 'Language Probability',
'type': 'number'},
'text': {'title': 'Text', 'type': 'string'},
'words': {'items': {'properties': {'end': {'nullable': True,
'title': 'End',
'type': 'number'},
'speaker_id': {'nullable': True,
'title': 'Speaker '
'Id',
'type': 'string'},
'start': {'nullable': True,
'title': 'Start',
'type': 'number'},
'text': {'title': 'Text',
'type': 'string'},
'type': {'title': 'Type',
'type': 'string'}},
'required': ['text', 'type'],
'type': 'object'},
'nullable': True,
'title': 'Words',
'type': 'array'}},
'required': ['text', 'language_code', 'language_probability'],
'title': 'Output',
'type': 'object'}