collectiveai-team / speaker-diarization-3

Segments an audio recording based on who is speaking

  • Public
  • 3K runs
  • L40S
  • GitHub
  • License

Input

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
file

Audio file or url

Default: "https://replicate.delivery/pbxt/IZjTvet2ZGiyiYaMEEPrzn0xY1UDNsh0NfcO9qeTlpwCo7ig/lex-levin-4min.mp3"

integer
(minimum: 1)

Number of speakers to diarize. Default: infer

integer
(minimum: 1)

Minimum number of speakers to diarize. Default: None

integer
(minimum: 1)

Maximum number of speakers to diarize. Default: None

Output

segments

stop

0:00:06.629881

start

0:00:00.008489

speaker

A

stop

0:00:08.276740

start

0:00:07.122241

speaker

A

stop

0:00:09.787776

start

0:00:08.616299

speaker

A

stop

0:00:13.658744

start

0:00:09.906621

speaker

A

stop

0:00:18.582343

start

0:00:13.675722

speaker

A

stop

0:00:26.052632

start

0:00:18.989813

speaker

A

stop

0:00:22.555178

start

0:00:22.300509

speaker

B

stop

0:00:32.555178

start

0:00:26.324278

speaker

A

stop

0:00:32.691002

start

0:00:32.589134

speaker

A

stop

0:00:44.100170

start

0:00:32.724958

speaker

A

stop

0:00:47.292020

start

0:00:44.371817

speaker

A

stop

0:00:59.261460

start

0:00:47.648557

speaker

A

stop

0:00:56.884550

start

0:00:56.561969

speaker

B

stop

0:01:08.497453

start

0:01:00.059423

speaker

A

stop

0:01:13.539898

start

0:01:08.887946

speaker

A

stop

0:01:49.023769

start

0:01:13.998302

speaker

A

stop

0:02:12.640068

start

0:01:49.346350

speaker

A

stop

0:02:45.679117

start

0:02:12.911715

speaker

A

stop

0:02:13.302207

start

0:02:13.064516

speaker

B

stop

0:03:12.351443

start

0:02:46.273345

speaker

A

stop

0:02:55.424448

start

0:02:54.949066

speaker

B

stop

0:03:02.911715

start

0:03:02.724958

speaker

B

stop

0:03:34.881154

start

0:03:12.962649

speaker

A

speakers

{ "count": 2, "labels": [ "A", "B" ], "embeddings": { "A": [ -0.2980277958241376, -0.19159927533625007, -0.2675081236989467, -0.25381432860702663, 0.09958059384257763, -0.09743198539529528, 0.4132504979898403, 0.2032775030314148, 0.01562528210607442, -0.6177756972127146, 0.07577934418502566, 0.077455000068944, -0.42971207220833024, 0.143487013489395, 0.04374769898494343, -0.1741128261987265, 0.052689338707691664, -0.10136796780801438, -0.02375249339678845, 0.26002338644746065, -0.3924751459778129, -0.41776453597205027, 0.42611784710512535, 0.03243526141461614, 0.3725283621967613, 0.12168989166036828, -0.5446009047619709, -0.06255066316187768, 0.16032569468408436, 0.24098822314824378, -0.19773885567805596, 0.15155758430528177, 0.08128198661974498, -0.022670983977906116, 0.48185896563839603, -0.29806817690660425, -0.05942703619018778, -0.07483277019935769, 0.38329678773880005, -0.08116064692375721, -0.16691496760233657, -0.42538210130357124, 0.411977027530794, 0.8507556992691833, 0.23823742723310148, 0.4496179528824695, 0.10730816622252588, -0.22765118604550114, -0.2253746965585591, 0.22255700260594294, -0.011031272639701892, -0.33252304869812804, 0.0882140866650099, -0.10681848276358147, -0.29219809426115706, 0.00936510863822776, -0.22848142219054235, 0.11640579674949313, -0.13048594787281442, -0.18411347891699958, 0.3619395898921149, -0.6014321190970284, 0.11484903825277631, 0.4214380840202431, 0.028998981858906033, -0.09855928494558706, -0.024741925069360765, -0.14838007895590424, 0.07313753858015135, 0.0616028198803013, -0.22130442420383553, -0.0486562324615268, 0.012167474557343242, 0.11412440937060814, -0.1500650424171578, -0.31762206128665377, 0.03374973084632452, 0.07254524845871832, -0.3653383216300568, 0.28723999974015474, -0.06534733992771476, 0.07936668623383943, 0.30377362358879734, -0.0313888629172723, -0.05317511940495921, -0.007353778214907491, -0.21232955731064468, 0.02324318266534186, -0.05557481255140397, -0.36463111813192245, -0.29444163447463667, -0.14647939109376498, 0.1298446642694535, 0.5466475808001184, 0.30658132815128797, -0.17523605652250251, -0.07153023406863213, 0.04778134583362511, -0.2236534280629901, -0.046524424686447366, 0.2998360505738816, -0.037915714774522685, -0.19151858370993044, 0.11666154106716056, 0.39011356350663423, -0.3191354412150073, 0.044048880831665034, 0.14075470517401573, -0.041643699558524344, -0.16565525249953006, 0.028705820949240166, -0.21070052218901647, 0.4992749884531095, 0.14883781959871192, -0.4119358577511527, -0.5384860921215702, -0.1578043360843674, 0.1715867846913926, -0.05601542578502135, -0.181090600014507, -0.004767911853389694, 0.36159427367247543, -0.6207205931861679, 0.16143732430872978, -0.2347321788986008, 0.02273632080427238, 0.055050118403001266, -0.24649299690862755, -0.21079601337770362, -0.09435739073079902, -0.08087513749427222, -0.12570423705430775, 0.054864239286292686, 0.2445069279570084, -0.3486694542618541, -0.4622815357787268, -0.18829677839364325, 0.10042502855664337, 0.256172057192821, 0.25681665462332887, 0.012307212748504305, 0.4981143075924415, 0.08744535314572321, 0.19451923716764946, -0.272460309328971, -0.12226968038488518, -0.1705902868083545, -0.2491248824766704, 0.11083746145104433, -0.04127803712466417, 0.26021204985581436, -0.11932596629606439, -0.24519814463791909, -0.02791117617933007, -0.28244354604900657, 0.15853182332856314, 0.4105076242189903, -0.06333677049774628, 0.3073548703611671, -0.000280045722792675, 0.14617977460676973, 0.12411557157318313, -0.05663811833924287, -0.2579154026779262, 0.2646378246075534, -0.08221379174040509, 0.19624638291341917, -0.0843930971361213, -0.1377280969898422, -0.032491969694564866, -0.13732083574808263, -0.008927843235614527, 0.19387406647785918, -0.156196792162471, -0.007981663303715842, -0.04979786847705965, -0.1279741834075033, 0.2477875669281204, 0.2247705768261637, 0.27278280325911264, 0.29017828840326954, -0.10455101024504605, -0.004279102259255075, -0.40712080025053643, -0.11401444538073106, -0.08654718788026215, -0.26493524334260393, 0.16893563661482427, -0.2326994313822164, 0.37047250897853407, 0.008457627486098896, -0.10205523319566598, -0.11571106793625015, 0.055834398471883366, 0.2878503557536509, 0.18265529146248644, -0.020800547686393386, 0.14096574524006286, -0.03788623276662517, -0.0743287546294076, 0.1259401597856701, -0.10667986877401686, 0.21194476094822606, -0.3093454746843933, 0.2447658060239507, -0.31693133305419574, 0.01592029882715894, 0.17525215579995088, 0.09279285385817676, -0.09228506655275047, 0.07441991091669574, 0.1557216060819564, 0.19363100664665947, -0.12881685082208028, -0.3358800041598159, 0.015616602990050594, -0.31054633313959296, 0.19975015927444806, 0.0874475831432002, 0.09344918660626009, 0.2395071971484206, 0.19913564876399256, 0.6006629726329407, 0.1309550589845552, 0.0729240306241849, -0.013779515424719104, 0.13759209768387018, -0.21896729806994464, -0.09213411970088234, -0.5560763033179493, -0.23635780221069014, -0.43267011487638796, -0.12215130321391217, 0.03636081589313297, 0.2627331919290803, 0.31121962735211695, -0.13332596469622154, -0.398363497350123, -0.21667043869565059, -0.5754971268115106, -0.10026832313167972, 0.010454692140028074, -0.17342204542516113, -0.1105685666832444, -0.2382642917044751, 0.1831290785513528, 0.19652475426336388, 0.050046242114350006, 0.15041765251329967, -0.18117927555478625, 0.050221169585144367, 0.08144111413653794, 0.21002425403370487, -0.5958981200472101, -0.20697989953415735, -0.03530745213101437 ], "B": [] } }
Generated in

This output was created using a different version of the model, collectiveai-team/speaker-diarization-3:bb2f3320.

Run time and cost

This model costs approximately $0.22 to run on Replicate, or 4 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 4 minutes. The predict time for this model varies significantly based on the inputs.

Readme

This model doesn't have a readme.