openai/gpt-4o-mini-transcribe

A speech-to-text model that uses GPT-4o mini to transcribe audio

Official
945 runs
Commercial use

Input

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
*file

The audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm

string
Shift + Return to add a new line

The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.

string
Shift + Return to add a new line

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

number
(minimum: 0, maximum: 1)

Sampling temperature between 0 and 1

Default: 0

Output

So we just added GPT-4o Mini Transcribe to Replicate, and thought you'd want to know. It's basically a speech-to-text model that uses GPT-4o Mini to turn your audio into text. The cool thing is that it's noticeably better than the Whisper models we've been using. Fewer errors, better at recognizing different languages, and just more accurate overall. If you've ever been frustrated with transcripts that mess up technical terms or struggle with different accents, you'll probably appreciate this upgrade. It just works better. Some quick tech specs if you're curious. It has a 16,000 token context window, which means it can handle longer audio clips in one go. And it can output up to 2,000 tokens, so you'll get nice complete transcripts. The model's knowledge is current up to June 2024.
Generated in
Input tokens
870
Output tokens
160
Tokens per second
45.68 tokens / second
Time to first token

Pricing

Model pricing for openai/gpt-4o-mini-transcribe. Looking for volume pricing? Get in touch.

$5
per million output tokens

or 200,000 tokens for $1

$3
per million input tokens

or around 333,333 tokens for $1

Official models are always on, maintained, and have predictable pricing. Learn more.

Check out our docs for more information about how pricing works on Replicate.

Readme

GPT-4o mini Transcribe is a speech-to-text model that uses GPT-4o mini to transcribe audio. It offers improvements to word error rate and better language recognition and accuracy compared to original Whisper models. Use it for more accurate transcripts.

  • 16,000 context window
  • 2,000 max output tokens
  • Jun 01, 2024 knowledge cutoff