sidedwards / whisperx

WhisperX with accelerated transcription and advanced speaker diarization provides fast and accurate transcriptions with speaker segments.

  • Public
  • 192 runs
  • L40S
  • GitHub

Input

pip install replicate
Set the REPLICATE_API_TOKEN environment variable:
export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:
import replicate

Run sidedwards/whisperx using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "sidedwards/whisperx:e9625c116484a995c0f8c77f1744ea8c518ff68528f254721caa525a69ab3236",
    input={
        "debug": False,
        "vad_onset": 0.5,
        "batch_size": 64,
        "vad_offset": 0.363,
        "diarization": False,
        "temperature": 0,
        "align_output": False,
        "language_detection_min_prob": 0,
        "language_detection_max_tries": 5
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Output

No output yet! Press "Submit" to start a prediction.

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

WhisperX with accelerated transcription and advanced speaker diarization provides fast and accurate transcriptions with speaker segments.

Based on: https://github.com/victor-upmeet/whisperx-replicate

Citation

@misc{bain2023whisperx,
      title={WhisperX: Time-Accurate Speech Transcription of Long-Form Audio}, 
      author={Max Bain and Jaesung Huh and Tengda Han and Andrew Zisserman},
      year={2023},
      eprint={2303.00747},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

For more information, visit the WhisperX GitHub repository.