You're looking at a specific version of this model. Jump to the model overview.

zsxkib /kimi-audio-7b-instruct:7500b323

Input

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
*file

Input audio file for processing. Can be used for speech-to-text (ASR) or audio-to-audio generation.

string
Shift + Return to add a new line

Optional text prompt to guide the model. For ASR, use prompts like 'Please convert this audio to text' or 'čÆ·å°†éŸ³é¢‘å†…å®¹č½¬ę¢äøŗę–‡å­—' (Chinese).

string

Type of output to generate: 'audio' for audio only, 'text' for transcription only, or 'both' for both audio and text responses.

Default: "both"

boolean

Return text results in JSON format instead of text file

Default: true

number

Temperature for audio generation. Higher values (0.8-1.0) increase creativity but may reduce coherence.

Default: 0.8

integer

Top-k for audio generation. Limits the token selection to the k most likely tokens.

Default: 10

number

Temperature for text generation. Lower values (0.0-0.5) increase factual accuracy.

Default: 0

integer

Top-k for text generation. Limits the token selection to the k most likely tokens.

Default: 5

number

Repetition penalty for audio. Values > 1.0 discourage repetition in audio generation.

Default: 1

integer

Window size for audio repetition penalty calculation.

Default: 64

number

Repetition penalty for text. Values > 1.0 discourage repetition in text generation.

Default: 1

integer

Window size for text repetition penalty calculation.

Default: 16

Output

Open waits text to dialogue model. You get full control over scripts and voices.
Generated in