elevenlabs/scribe-v2:5cd433d1 | Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

elevenlabs /scribe-v2:5cd433d1

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
audio	string		Audio or video file to transcribe. Supports MP3, WAV, M4A, FLAC, OGG, OPUS, WebM, AAC, MP4, MOV, MKV, AVI, and more. Max 3 GB, up to 10 hours.
language_code	string	auto	Language of the audio as an ISO-639-1 (e.g. 'en') or ISO-639-3 (e.g. 'eng') code. Set to 'auto' to detect the language automatically. Setting a specific language can improve accuracy for noisy or unusual audio.
diarize	boolean	False	Identify and label different speakers in the audio. When enabled, each word in the output includes a 'speaker_id'. Supports up to 32 speakers.
num_speakers	integer	0 Max: 32	Maximum number of speakers expected in the audio. Helps the model with diarization. Set to 0 to let the model decide. Only used when 'diarize' is true.
timestamps_granularity	None	word	Granularity of word timestamps in the output. 'word' returns start/end times for each word, 'character' adds per-character timing, 'none' omits timestamps.
tag_audio_events	boolean	True	Tag non-speech sounds in the transcription, like (laughter), (footsteps), or (applause).
no_verbatim	boolean	False	Remove filler words ('um', 'uh'), false starts, and disfluencies from the transcript. Produces a cleaner, more readable output.
keyterms	string		Comma-separated list of words or phrases to bias transcription towards. Useful for product names, technical terms, or proper nouns. Up to 1000 terms, max 50 characters each.
temperature	number	-1 Min: -1 Max: 2	Sampling temperature. Higher values produce more diverse, less deterministic output. Set to -1 to use the model default (usually 0).
seed	integer	-1 Min: -1 Max: 2147483647	Random seed for reproducible outputs. Set to -1 to use a non-deterministic seed.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema

{'properties': {'duration_seconds': {'nullable': True,
                                     'title': 'Duration Seconds',
                                     'type': 'number'},
                'language_code': {'title': 'Language Code', 'type': 'string'},
                'language_probability': {'title': 'Language Probability',
                                         'type': 'number'},
                'text': {'title': 'Text', 'type': 'string'},
                'words': {'items': {'properties': {'end': {'nullable': True,
                                                           'title': 'End',
                                                           'type': 'number'},
                                                   'speaker_id': {'nullable': True,
                                                                  'title': 'Speaker '
                                                                           'Id',
                                                                  'type': 'string'},
                                                   'start': {'nullable': True,
                                                             'title': 'Start',
                                                             'type': 'number'},
                                                   'text': {'title': 'Text',
                                                            'type': 'string'},
                                                   'type': {'title': 'Type',
                                                            'type': 'string'}},
                                    'required': ['text', 'type'],
                                    'type': 'object'},
                          'nullable': True,
                          'title': 'Words',
                          'type': 'array'}},
 'required': ['text', 'language_code', 'language_probability'],
 'title': 'Output',
 'type': 'object'}