You're looking at a specific version of this model. Jump to the model overview.

lucataco /vibevoice-asr:e00f1716

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
audio
string
Audio file to transcribe.
prompt
string
Optional context or hotwords to improve recognition.
max_new_tokens
integer
1024

Min: 64

Max: 8192

Maximum generated text tokens.
tokenizer_chunk_size
integer
1440000

Min: 64000

Max: 1440000

Audio tokenizer chunk size in samples. Use 64000 if VRAM is tight.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'additionalProperties': {'type': 'object'},
 'title': 'Output',
 'type': 'object'}