You're looking at a specific version of this model. Jump to the model overview.

elevenlabs /scribe-v2:5cd433d1

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
audio
string
Audio or video file to transcribe. Supports MP3, WAV, M4A, FLAC, OGG, OPUS, WebM, AAC, MP4, MOV, MKV, AVI, and more. Max 3 GB, up to 10 hours.
language_code
string
auto
Language of the audio as an ISO-639-1 (e.g. 'en') or ISO-639-3 (e.g. 'eng') code. Set to 'auto' to detect the language automatically. Setting a specific language can improve accuracy for noisy or unusual audio.
diarize
boolean
False
Identify and label different speakers in the audio. When enabled, each word in the output includes a 'speaker_id'. Supports up to 32 speakers.
num_speakers
integer
0

Max: 32

Maximum number of speakers expected in the audio. Helps the model with diarization. Set to 0 to let the model decide. Only used when 'diarize' is true.
timestamps_granularity
None
word
Granularity of word timestamps in the output. 'word' returns start/end times for each word, 'character' adds per-character timing, 'none' omits timestamps.
tag_audio_events
boolean
True
Tag non-speech sounds in the transcription, like (laughter), (footsteps), or (applause).
no_verbatim
boolean
False
Remove filler words ('um', 'uh'), false starts, and disfluencies from the transcript. Produces a cleaner, more readable output.
keyterms
string
Comma-separated list of words or phrases to bias transcription towards. Useful for product names, technical terms, or proper nouns. Up to 1000 terms, max 50 characters each.
temperature
number
-1

Min: -1

Max: 2

Sampling temperature. Higher values produce more diverse, less deterministic output. Set to -1 to use the model default (usually 0).
seed
integer
-1

Min: -1

Max: 2147483647

Random seed for reproducible outputs. Set to -1 to use a non-deterministic seed.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'properties': {'duration_seconds': {'nullable': True,
                                     'title': 'Duration Seconds',
                                     'type': 'number'},
                'language_code': {'title': 'Language Code', 'type': 'string'},
                'language_probability': {'title': 'Language Probability',
                                         'type': 'number'},
                'text': {'title': 'Text', 'type': 'string'},
                'words': {'items': {'properties': {'end': {'nullable': True,
                                                           'title': 'End',
                                                           'type': 'number'},
                                                   'speaker_id': {'nullable': True,
                                                                  'title': 'Speaker '
                                                                           'Id',
                                                                  'type': 'string'},
                                                   'start': {'nullable': True,
                                                             'title': 'Start',
                                                             'type': 'number'},
                                                   'text': {'title': 'Text',
                                                            'type': 'string'},
                                                   'type': {'title': 'Type',
                                                            'type': 'string'}},
                                    'required': ['text', 'type'],
                                    'type': 'object'},
                          'nullable': True,
                          'title': 'Words',
                          'type': 'array'}},
 'required': ['text', 'language_code', 'language_probability'],
 'title': 'Output',
 'type': 'object'}