soykertje / whisper

Convert speech in audio to text

  • Public
  • 59.3K runs
  • GitHub
  • License

Input

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
*file

Audio file

string

Choose a Whisper model.

Default: "large-v2"

string

Choose the format for the transcription

Default: "plain text"

boolean

Translate the text to English when set to True

Default: false

string

language spoken in the audio, specify None to perform language detection

number

temperature to use for sampling

Default: 0

number

optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search

string
Shift + Return to add a new line

comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations

Default: "-1"

string
Shift + Return to add a new line

optional text to provide as a prompt for the first window.

boolean

if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop

Default: true

number

temperature to increase when falling back when the decoding fails to meet either of the thresholds below

Default: 0.2

number

if the gzip compression ratio is higher than this value, treat the decoding as failed

Default: 2.4

number

if the average log probability is lower than this value, treat the decoding as failed

Default: -1

number

if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence

Default: 0.6

boolean

Improves the accuracy of the timestamps by using word-level timestamps

Default: true

Output

segments

[ { "id": 0, "end": 3.14, "seek": 0, "text": " This is the Micro Machine Man presenting the most midget miniature motorcade of micro machines.", "start": 0.18, "words": [ { "end": 0.5, "word": " This", "start": 0.18, "probability": 0.5188008546829224 }, { "end": 0.66, "word": " is", "start": 0.5, "probability": 0.9036935567855835 }, { "end": 0.88, "word": " the", "start": 0.66, "probability": 0.7630036473274231 }, { "end": 0.9, "word": " Micro", "start": 0.88, "probability": 0.7599949836730957 }, { "end": 1, "word": " Machine", "start": 0.9, "probability": 0.43241745233535767 }, { "end": 1.18, "word": " Man", "start": 1, "probability": 0.8935106992721558 }, { "end": 1.4, "word": " presenting", "start": 1.18, "probability": 0.4879225194454193 }, { "end": 1.52, "word": " the", "start": 1.4, "probability": 0.8076907992362976 }, { "end": 1.68, "word": " most", "start": 1.52, "probability": 0.8195436596870422 }, { "end": 1.86, "word": " midget", "start": 1.68, "probability": 0.9426034390926361 }, { "end": 2.14, "word": " miniature", "start": 1.86, "probability": 0.6330108642578125 }, { "end": 2.46, "word": " motorcade", "start": 2.14, "probability": 0.7706209719181061 }, { "end": 2.7, "word": " of", "start": 2.46, "probability": 0.8996413946151733 }, { "end": 2.9, "word": " micro", "start": 2.7, "probability": 0.2997014820575714 }, { "end": 3.14, "word": " machines.", "start": 2.9, "probability": 0.5044888257980347 } ], "tokens": [ 50364, 639, 307, 264, 25642, 22155, 2458, 15578, 264, 881, 2062, 847, 34674, 5932, 30340, 295, 4532, 8379, 13, 50524 ], "avg_logprob": -0.29071213478265806, "temperature": 0, "no_speech_prob": 0.5242696404457092, "compression_ratio": 2.023206751054852 }, { "id": 1, "end": 6.66, "seek": 0, "text": " Each one has dramatic details, terrific trim, precision paint jobs, plus incredible micro machine pocket play sets.", "start": 3.46, "words": [ { "end": 3.56, "word": " Each", "start": 3.46, "probability": 0.7921050786972046 }, { "end": 3.68, "word": " one", "start": 3.56, "probability": 0.8239385485649109 }, { "end": 3.8, "word": " has", "start": 3.68, "probability": 0.8931549787521362 }, { "end": 4.04, "word": " dramatic", "start": 3.8, "probability": 0.76500004529953 }, { "end": 4.24, "word": " details,", "start": 4.04, "probability": 0.6641080379486084 }, { "end": 4.5, "word": " terrific", "start": 4.48, "probability": 0.816464900970459 }, { "end": 4.66, "word": " trim,", "start": 4.5, "probability": 0.41670969128608704 }, { "end": 4.96, "word": " precision", "start": 4.78, "probability": 0.7823763489723206 }, { "end": 5.18, "word": " paint", "start": 4.96, "probability": 0.8373522162437439 }, { "end": 5.3, "word": " jobs,", "start": 5.18, "probability": 0.45608851313591003 }, { "end": 5.52, "word": " plus", "start": 5.34, "probability": 0.8570372462272644 }, { "end": 5.78, "word": " incredible", "start": 5.52, "probability": 0.7519615888595581 }, { "end": 5.98, "word": " micro", "start": 5.78, "probability": 0.6544991135597229 }, { "end": 6.2, "word": " machine", "start": 5.98, "probability": 0.659029483795166 }, { "end": 6.54, "word": " pocket", "start": 6.2, "probability": 0.6552930474281311 }, { "end": 6.62, "word": " play", "start": 6.54, "probability": 0.6885859370231628 }, { "end": 6.66, "word": " sets.", "start": 6.62, "probability": 0.633192241191864 } ], "tokens": [ 50524, 6947, 472, 575, 12023, 4365, 11, 20899, 10445, 11, 18356, 4225, 4782, 11, 1804, 4651, 4532, 3479, 8963, 862, 6352, 13, 50699 ], "avg_logprob": -0.29071213478265806, "temperature": 0, "no_speech_prob": 0.5242696404457092, "compression_ratio": 2.023206751054852 }, { "id": 2, "end": 8.76, "seek": 0, "text": " There's a police station, fire station, restaurant, service station, and more.", "start": 6.8, "words": [ { "end": 6.9, "word": " There's", "start": 6.8, "probability": 0.8825059831142426 }, { "end": 6.96, "word": " a", "start": 6.9, "probability": 0.9860183000564575 }, { "end": 7.18, "word": " police", "start": 6.96, "probability": 0.8330947756767273 }, { "end": 7.3, "word": " station,", "start": 7.18, "probability": 0.8929913640022278 }, { "end": 7.58, "word": " fire", "start": 7.4, "probability": 0.7919226884841919 }, { "end": 7.7, "word": " station,", "start": 7.58, "probability": 0.8724629282951355 }, { "end": 8.02, "word": " restaurant,", "start": 7.8, "probability": 0.7704039216041565 }, { "end": 8.26, "word": " service", "start": 8.08, "probability": 0.8293709754943848 }, { "end": 8.44, "word": " station,", "start": 8.26, "probability": 0.9018292427062988 }, { "end": 8.64, "word": " and", "start": 8.44, "probability": 0.8904381990432739 }, { "end": 8.76, "word": " more.", "start": 8.64, "probability": 0.8884001970291138 } ], "tokens": [ 50699, 821, 311, 257, 3804, 5214, 11, 2610, 5214, 11, 6383, 11, 2643, 5214, 11, 293, 544, 13, 50804 ], "avg_logprob": -0.29071213478265806, "temperature": 0, "no_speech_prob": 0.5242696404457092, "compression_ratio": 2.023206751054852 }, { "id": 3, "end": 10.28, "seek": 0, "text": " Perfect pocket portables to take anyplace.", "start": 9, "words": [ { "end": 9.16, "word": " Perfect", "start": 9, "probability": 0.8314006328582764 }, { "end": 9.38, "word": " pocket", "start": 9.16, "probability": 0.7699366211891174 }, { "end": 9.62, "word": " portables", "start": 9.38, "probability": 0.9437406063079834 }, { "end": 9.8, "word": " to", "start": 9.62, "probability": 0.8796511292457581 }, { "end": 9.9, "word": " take", "start": 9.8, "probability": 0.8167694807052612 }, { "end": 10.28, "word": " anyplace.", "start": 9.9, "probability": 0.6491216719150543 } ], "tokens": [ 50804, 10246, 8963, 2436, 2965, 281, 747, 604, 6742, 13, 50879 ], "avg_logprob": -0.29071213478265806, "temperature": 0, "no_speech_prob": 0.5242696404457092, "compression_ratio": 2.023206751054852 }, { "id": 4, "end": 15.3, "seek": 0, "text": " And there are many miniature play sets to play with and each one comes with its own special edition micro machine vehicle and fun fantastic features that miraculously move.", "start": 10.4, "words": [ { "end": 10.54, "word": " And", "start": 10.4, "probability": 0.8727167844772339 }, { "end": 10.66, "word": " there", "start": 10.54, "probability": 0.8446416854858398 }, { "end": 10.68, "word": " are", "start": 10.66, "probability": 0.8793963193893433 }, { "end": 10.82, "word": " many", "start": 10.68, "probability": 0.8155453205108643 }, { "end": 11.02, "word": " miniature", "start": 10.82, "probability": 0.802977979183197 }, { "end": 11.24, "word": " play", "start": 11.02, "probability": 0.8290015459060669 }, { "end": 11.24, "word": " sets", "start": 11.24, "probability": 0.8442025780677795 }, { "end": 11.42, "word": " to", "start": 11.24, "probability": 0.895969808101654 }, { "end": 11.52, "word": " play", "start": 11.42, "probability": 0.8799439072608948 }, { "end": 11.6, "word": " with", "start": 11.52, "probability": 0.7744315266609192 }, { "end": 11.7, "word": " and", "start": 11.6, "probability": 0.40789568424224854 }, { "end": 11.78, "word": " each", "start": 11.7, "probability": 0.8371046185493469 }, { "end": 11.92, "word": " one", "start": 11.78, "probability": 0.8314981460571289 }, { "end": 12.04, "word": " comes", "start": 11.92, "probability": 0.8056870102882385 }, { "end": 12.14, "word": " with", "start": 12.04, "probability": 0.8167880773544312 }, { "end": 12.24, "word": " its", "start": 12.14, "probability": 0.6112188100814819 }, { "end": 12.36, "word": " own", "start": 12.24, "probability": 0.8195785880088806 }, { "end": 12.58, "word": " special", "start": 12.36, "probability": 0.9109472632408142 }, { "end": 12.88, "word": " edition", "start": 12.58, "probability": 0.7968374490737915 }, { "end": 13.1, "word": " micro", "start": 12.88, "probability": 0.7488399147987366 }, { "end": 13.28, "word": " machine", "start": 13.1, "probability": 0.71075838804245 }, { "end": 13.52, "word": " vehicle", "start": 13.28, "probability": 0.8960368037223816 }, { "end": 13.66, "word": " and", "start": 13.52, "probability": 0.8079777956008911 }, { "end": 13.84, "word": " fun", "start": 13.66, "probability": 0.8869229555130005 }, { "end": 14.1, "word": " fantastic", "start": 13.84, "probability": 0.6974602341651917 }, { "end": 14.34, "word": " features", "start": 14.1, "probability": 0.73177570104599 }, { "end": 14.54, "word": " that", "start": 14.34, "probability": 0.866608738899231 }, { "end": 14.84, "word": " miraculously", "start": 14.54, "probability": 0.9221479594707489 }, { "end": 15.3, "word": " move.", "start": 14.84, "probability": 0.8684508204460144 } ], "tokens": [ 50879, 400, 456, 366, 867, 34674, 862, 6352, 281, 862, 365, 293, 1184, 472, 1487, 365, 1080, 1065, 2121, 11377, 4532, 3479, 5864, 293, 1019, 5456, 4122, 300, 30686, 25038, 1286, 13, 51129 ], "avg_logprob": -0.29071213478265806, "temperature": 0, "no_speech_prob": 0.5242696404457092, "compression_ratio": 2.023206751054852 }, { "id": 5, "end": 19.22, "seek": 0, "text": " Raise the boat lift at the airport, marina, man the gun turret at the army base, clean your car at the car wash, raise the toll bridge.", "start": 15.64, "words": [ { "end": 15.66, "word": " Raise", "start": 15.64, "probability": 0.8107876181602478 }, { "end": 15.78, "word": " the", "start": 15.66, "probability": 0.8148850202560425 }, { "end": 15.9, "word": " boat", "start": 15.78, "probability": 0.8849641680717468 }, { "end": 15.98, "word": " lift", "start": 15.9, "probability": 0.45141515135765076 }, { "end": 16.1, "word": " at", "start": 15.98, "probability": 0.8778955340385437 }, { "end": 16.24, "word": " the", "start": 16.1, "probability": 0.8175328373908997 }, { "end": 16.36, "word": " airport,", "start": 16.24, "probability": 0.8736236691474915 }, { "end": 16.62, "word": " marina,", "start": 16.38, "probability": 0.6979465484619141 }, { "end": 16.84, "word": " man", "start": 16.68, "probability": 0.8808432221412659 }, { "end": 16.92, "word": " the", "start": 16.84, "probability": 0.7407904863357544 }, { "end": 17.1, "word": " gun", "start": 16.92, "probability": 0.896804690361023 }, { "end": 17.22, "word": " turret", "start": 17.1, "probability": 0.9114507436752319 }, { "end": 17.32, "word": " at", "start": 17.22, "probability": 0.8769182562828064 }, { "end": 17.44, "word": " the", "start": 17.32, "probability": 0.8099021315574646 }, { "end": 17.54, "word": " army", "start": 17.44, "probability": 0.7624181509017944 }, { "end": 17.66, "word": " base,", "start": 17.54, "probability": 0.8227003216743469 }, { "end": 17.88, "word": " clean", "start": 17.74, "probability": 0.7740076780319214 }, { "end": 18, "word": " your", "start": 17.88, "probability": 0.7935739159584045 }, { "end": 18.12, "word": " car", "start": 18, "probability": 0.8899481892585754 }, { "end": 18.22, "word": " at", "start": 18.12, "probability": 0.8764775395393372 }, { "end": 18.26, "word": " the", "start": 18.22, "probability": 0.8148840069770813 }, { "end": 18.48, "word": " car", "start": 18.26, "probability": 0.9004565477371216 }, { "end": 18.52, "word": " wash,", "start": 18.48, "probability": 0.5889071226119995 }, { "end": 18.78, "word": " raise", "start": 18.62, "probability": 0.7670565843582153 }, { "end": 18.94, "word": " the", "start": 18.78, "probability": 0.8077996969223022 }, { "end": 19.06, "word": " toll", "start": 18.94, "probability": 0.9132710099220276 }, { "end": 19.22, "word": " bridge.", "start": 19.06, "probability": 0.822396993637085 } ], "tokens": [ 51129, 30062, 264, 6582, 5533, 412, 264, 10155, 11, 1849, 1426, 11, 587, 264, 3874, 34544, 412, 264, 7267, 3096, 11, 2541, 428, 1032, 412, 264, 1032, 5675, 11, 5300, 264, 16629, 7283, 13, 51329 ], "avg_logprob": -0.29071213478265806, "temperature": 0, "no_speech_prob": 0.5242696404457092, "compression_ratio": 2.023206751054852 }, { "id": 6, "end": 21.22, "seek": 0, "text": " And these play sets fit together to form a micro machine world.", "start": 19.24, "words": [ { "end": 19.5, "word": " And", "start": 19.24, "probability": 0.8680288195610046 }, { "end": 19.62, "word": " these", "start": 19.5, "probability": 0.6982218623161316 }, { "end": 19.8, "word": " play", "start": 19.62, "probability": 0.8423182368278503 }, { "end": 19.84, "word": " sets", "start": 19.8, "probability": 0.8482057452201843 }, { "end": 20.08, "word": " fit", "start": 19.84, "probability": 0.8614595532417297 }, { "end": 20.2, "word": " together", "start": 20.08, "probability": 0.7896531820297241 }, { "end": 20.42, "word": " to", "start": 20.2, "probability": 0.8943203091621399 }, { "end": 20.42, "word": " form", "start": 20.42, "probability": 0.8038238883018494 }, { "end": 20.58, "word": " a", "start": 20.42, "probability": 0.9939543604850769 }, { "end": 20.74, "word": " micro", "start": 20.58, "probability": 0.8134252429008484 }, { "end": 20.92, "word": " machine", "start": 20.74, "probability": 0.7036716341972351 }, { "end": 21.22, "word": " world.", "start": 20.92, "probability": 0.871777355670929 } ], "tokens": [ 51329, 400, 613, 862, 6352, 3318, 1214, 281, 1254, 257, 4532, 3479, 1002, 13, 51429 ], "avg_logprob": -0.29071213478265806, "temperature": 0, "no_speech_prob": 0.5242696404457092, "compression_ratio": 2.023206751054852 }, { "id": 7, "end": 25.28, "seek": 0, "text": " Micro machine pocket play sets so tremendously tiny, so perfectly precise, so dazzlingly detailed, you'll want to pocket them all.", "start": 21.48, "words": [ { "end": 21.6, "word": " Micro", "start": 21.48, "probability": 0.9181209206581116 }, { "end": 21.78, "word": " machine", "start": 21.6, "probability": 0.6455795764923096 }, { "end": 22, "word": " pocket", "start": 21.78, "probability": 0.7043859958648682 }, { "end": 22.16, "word": " play", "start": 22, "probability": 0.8732221722602844 }, { "end": 22.24, "word": " sets", "start": 22.16, "probability": 0.812091052532196 }, { "end": 22.42, "word": " so", "start": 22.24, "probability": 0.5316440463066101 }, { "end": 22.68, "word": " tremendously", "start": 22.42, "probability": 0.744458794593811 }, { "end": 22.88, "word": " tiny,", "start": 22.68, "probability": 0.815040647983551 }, { "end": 23.04, "word": " so", "start": 22.96, "probability": 0.9097583889961243 }, { "end": 23.24, "word": " perfectly", "start": 23.04, "probability": 0.8714672923088074 }, { "end": 23.5, "word": " precise,", "start": 23.24, "probability": 0.8715913891792297 }, { "end": 23.78, "word": " so", "start": 23.64, "probability": 0.9078233242034912 }, { "end": 24.3, "word": " dazzlingly", "start": 23.78, "probability": 0.8693957328796387 }, { "end": 24.3, "word": " detailed,", "start": 24.3, "probability": 0.7534111738204956 }, { "end": 24.52, "word": " you'll", "start": 24.32, "probability": 0.8286176919937134 }, { "end": 24.62, "word": " want", "start": 24.52, "probability": 0.7137832045555115 }, { "end": 24.78, "word": " to", "start": 24.62, "probability": 0.8862699270248413 }, { "end": 24.88, "word": " pocket", "start": 24.78, "probability": 0.7723338007926941 }, { "end": 25.1, "word": " them", "start": 24.88, "probability": 0.7235183715820312 }, { "end": 25.28, "word": " all.", "start": 25.1, "probability": 0.8980650901794434 } ], "tokens": [ 51429, 25642, 3479, 8963, 862, 6352, 370, 27985, 5870, 11, 370, 6239, 13600, 11, 370, 44078, 1688, 356, 9942, 11, 291, 603, 528, 281, 8963, 552, 439, 13, 51629 ], "avg_logprob": -0.29071213478265806, "temperature": 0, "no_speech_prob": 0.5242696404457092, "compression_ratio": 2.023206751054852 }, { "id": 8, "end": 27.66, "seek": 0, "text": " Micro machines and micro machine pocket play sets sold separately from Galoob.", "start": 25.5, "words": [ { "end": 25.68, "word": " Micro", "start": 25.5, "probability": 0.9197527766227722 }, { "end": 25.84, "word": " machines", "start": 25.68, "probability": 0.7449283599853516 }, { "end": 26, "word": " and", "start": 25.84, "probability": 0.6610913276672363 }, { "end": 26.12, "word": " micro", "start": 26, "probability": 0.8715086579322815 }, { "end": 26.32, "word": " machine", "start": 26.12, "probability": 0.6785076856613159 }, { "end": 26.5, "word": " pocket", "start": 26.32, "probability": 0.7522760033607483 }, { "end": 26.68, "word": " play", "start": 26.5, "probability": 0.8618249297142029 }, { "end": 26.78, "word": " sets", "start": 26.68, "probability": 0.8218684792518616 }, { "end": 26.96, "word": " sold", "start": 26.78, "probability": 0.8280050158500671 }, { "end": 27.16, "word": " separately", "start": 26.96, "probability": 0.6918163895606995 }, { "end": 27.4, "word": " from", "start": 27.16, "probability": 0.8134355545043945 }, { "end": 27.66, "word": " Galoob.", "start": 27.4, "probability": 0.7426764170328776 } ], "tokens": [ 51629, 25642, 8379, 293, 4532, 3479, 8963, 862, 6352, 3718, 14759, 490, 7336, 78, 996, 13, 51754 ], "avg_logprob": -0.29071213478265806, "temperature": 0, "no_speech_prob": 0.5242696404457092, "compression_ratio": 2.023206751054852 }, { "id": 9, "end": 29.5, "seek": 0, "text": " The smaller they are, the better they are.", "start": 27.78, "words": [ { "end": 28, "word": " The", "start": 27.78, "probability": 0.8174500465393066 }, { "end": 28.2, "word": " smaller", "start": 28, "probability": 0.7499929666519165 }, { "end": 28.42, "word": " they", "start": 28.2, "probability": 0.7381458878517151 }, { "end": 28.68, "word": " are,", "start": 28.42, "probability": 0.8937985897064209 }, { "end": 28.88, "word": " the", "start": 28.68, "probability": 0.813441812992096 }, { "end": 29.02, "word": " better", "start": 28.88, "probability": 0.8151369690895081 }, { "end": 29.22, "word": " they", "start": 29.02, "probability": 0.7374864816665649 }, { "end": 29.5, "word": " are.", "start": 29.22, "probability": 0.8918095827102661 } ], "tokens": [ 51754, 440, 4356, 436, 366, 11, 264, 1101, 436, 366, 13, 51854 ], "avg_logprob": -0.29071213478265806, "temperature": 0, "no_speech_prob": 0.5242696404457092, "compression_ratio": 2.023206751054852 } ]

transcription

This is the Micro Machine Man presenting the most midget miniature motorcade of micro machines. Each one has dramatic details, terrific trim, precision paint jobs, plus incredible micro machine pocket play sets. There's a police station, fire station, restaurant, service station, and more. Perfect pocket portables to take anyplace. And there are many miniature play sets to play with and each one comes with its own special edition micro machine vehicle and fun fantastic features that miraculously move. Raise the boat lift at the airport, marina, man the gun turret at the army base, clean your car at the car wash, raise the toll bridge. And these play sets fit together to form a micro machine world. Micro machine pocket play sets so tremendously tiny, so perfectly precise, so dazzlingly detailed, you'll want to pocket them all. Micro machines and micro machine pocket play sets sold separately from Galoob. The smaller they are, the better they are.

detected_language

english
Generated in

Run time and cost

This model costs approximately $0.043 to run on Replicate, or 23 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 4 minutes. The predict time for this model varies significantly based on the inputs.

Readme

Whisper is a general-purpose speech transcription model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech transcription as well as speech translation and language identification.

This version uses the lasts whisper version available and add a new input to perform the transcription.