zeke / zisper

My own personal copy of daanelson/whisperx

  • Public
  • 311 runs
  • L40S
  • GitHub
  • Paper

Input

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
*file

Audio file

integer

Parallelization of input audio transcription

Default: 32

boolean

Use if you need word-level timing and not just batched transcription. Only works for English atm

Default: false

boolean

Set if you only want to return text; otherwise, segment metadata will be returned as well.

Default: false

boolean

Print out memory usage information.

Default: false

Output

[{"text": " This is a test of YOLO.", "start": 1.038, "end": 3.282}]
Generated in

Run time and cost

This model costs approximately $0.0093 to run on Replicate, or 107 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 10 seconds. The predict time for this model varies significantly based on the inputs.