victor-upmeet/whisperx:84d2ad2d – Run with an API on Replicate

Version

You're looking at a specific version of this model. Jump to the model overview.

victor-upmeet /whisperx:84d2ad2d

Playground API

Input

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

audio_file

*file

Audio file

language

string

Shift + Return to add a new line

ISO code of the language spoken in the audio, specify None to perform language detection

language_detection_min_prob

number

If language is not specified, then the language will be detected recursively on different parts of the file until it reaches the given probability

Default: 0

language_detection_max_tries

integer

If language is not specified, then the language will be detected following the logic of language_detection_min_prob parameter, but will stop after the given max retries. If max retries is reached, the most probable language is kept.

Default: 5

initial_prompt

string

Shift + Return to add a new line

Optional text to provide as a prompt for the first window

batch_size

integer

Parallelization of input audio transcription

Default: 64

temperature

number

Temperature to use for sampling

Default: 0

vad_onset

number

VAD onset

Default: 0.5

vad_offset

number

VAD offset

Default: 0.363

align_output

boolean

Aligns whisper output to get accurate word-level timestamps

Default: false

diarization

boolean

Assign speaker ID labels

Default: false

huggingface_access_token

string

Shift + Return to add a new line

To enable diarization, please enter your HuggingFace token (read). You need to accept the user agreement for the models specified in the README.

min_speakers

integer

Minimum number of speakers if diarization is activated (leave blank if unknown)

max_speakers

integer

Maximum number of speakers if diarization is activated (leave blank if unknown)

debug

boolean

Print out compute/inference times and memory usage information

Default: false

Run this model in Node.js with one line of code:

npx create-replicate --model=victor-upmeet/whisperx

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run victor-upmeet/whisperx using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "victor-upmeet/whisperx:84d2ad2d6194fe98a17d2b60bef1c7f910c46b2f6fd38996ca457afd9c8abfcb",
  {
    input: {
      debug: false,
      vad_onset: 0.5,
      audio_file: "https://replicate.delivery/pbxt/JrvsggK5WvFQ4Q53h4ugPbXW0LK2BLnMZm2dCPhM8bodUq5w/OSR_uk_000_0050_8k.wav",
      batch_size: 64,
      vad_offset: 0.363,
      diarization: false,
      temperature: 0,
      align_output: false,
      language_detection_min_prob: 0,
      language_detection_max_tries: 5
    }
  }
);
console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run victor-upmeet/whisperx using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "victor-upmeet/whisperx:84d2ad2d6194fe98a17d2b60bef1c7f910c46b2f6fd38996ca457afd9c8abfcb",
    input={
        "debug": False,
        "vad_onset": 0.5,
        "audio_file": "https://replicate.delivery/pbxt/JrvsggK5WvFQ4Q53h4ugPbXW0LK2BLnMZm2dCPhM8bodUq5w/OSR_uk_000_0050_8k.wav",
        "batch_size": 64,
        "vad_offset": 0.363,
        "diarization": False,
        "temperature": 0,
        "align_output": False,
        "language_detection_min_prob": 0,
        "language_detection_max_tries": 5
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run victor-upmeet/whisperx using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "84d2ad2d6194fe98a17d2b60bef1c7f910c46b2f6fd38996ca457afd9c8abfcb",
    "input": {
      "debug": false,
      "vad_onset": 0.5,
      "audio_file": "https://replicate.delivery/pbxt/JrvsggK5WvFQ4Q53h4ugPbXW0LK2BLnMZm2dCPhM8bodUq5w/OSR_uk_000_0050_8k.wav",
      "batch_size": 64,
      "vad_offset": 0.363,
      "diarization": false,
      "temperature": 0,
      "align_output": false,
      "language_detection_min_prob": 0,
      "language_detection_max_tries": 5
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

You can run this model locally using Cog. First, install Cog:

brew install cog

If you don’t have Homebrew, there are other installation options available.

Run this to download the model and run it in your local environment:

cog predict r8.im/victor-upmeet/whisperx@sha256:84d2ad2d6194fe98a17d2b60bef1c7f910c46b2f6fd38996ca457afd9c8abfcb \
  -i 'debug=false' \
  -i 'vad_onset=0.5' \
  -i 'audio_file="https://replicate.delivery/pbxt/JrvsggK5WvFQ4Q53h4ugPbXW0LK2BLnMZm2dCPhM8bodUq5w/OSR_uk_000_0050_8k.wav"' \
  -i 'batch_size=64' \
  -i 'vad_offset=0.363' \
  -i 'diarization=false' \
  -i 'temperature=0' \
  -i 'align_output=false' \
  -i 'language_detection_min_prob=0' \
  -i 'language_detection_max_tries=5'

To learn more, take a look at the Cog documentation.

Run this to download the model and run it in your local environment:

docker run -d -p 5000:5000 --gpus=all r8.im/victor-upmeet/whisperx@sha256:84d2ad2d6194fe98a17d2b60bef1c7f910c46b2f6fd38996ca457afd9c8abfcb
curl -s -X POST \
  -H "Content-Type: application/json" \
  -d $'{
    "input": {
      "debug": false,
      "vad_onset": 0.5,
      "audio_file": "https://replicate.delivery/pbxt/JrvsggK5WvFQ4Q53h4ugPbXW0LK2BLnMZm2dCPhM8bodUq5w/OSR_uk_000_0050_8k.wav",
      "batch_size": 64,
      "vad_offset": 0.363,
      "diarization": false,
      "temperature": 0,
      "align_output": false,
      "language_detection_min_prob": 0,
      "language_detection_max_tries": 5
    }
  }' \
  http://localhost:5000/predictions

To learn more, take a look at the Cog documentation.

Output

segments

[ { "end": 30.811, "text": " The little tales they tell are false. The door was barred, locked and bolted as well. Ripe pears are fit for a queen's table. A big wet stain was on the round carpet. The kite dipped and swayed but stayed aloft. The pleasant hours fly by much too soon. The room was crowded with a mild wob.", "start": 2.585 }, { "end": 48.592, "text": " The room was crowded with a wild mob. This strong arm shall shield your honor. She blushed when he gave her a white orchid. The beetle droned in the hot June sun.", "start": 33.029 } ]

detected_language

{
  "completed_at": "2023-11-13T08:47:22.035804Z",
  "created_at": "2023-11-13T08:47:18.460418Z",
  "data_removed": false,
  "error": null,
  "id": "h2ovig3bxqz5wwgyexzumicsam",
  "input": {
    "debug": false,
    "vad_onset": 0.5,
    "audio_file": "https://replicate.delivery/pbxt/JrvsggK5WvFQ4Q53h4ugPbXW0LK2BLnMZm2dCPhM8bodUq5w/OSR_uk_000_0050_8k.wav",
    "batch_size": 64,
    "vad_offset": 0.363,
    "diarization": false,
    "temperature": 0,
    "align_output": false
  },
  "logs": "No language specified, language will be first be detected for each audio file (increases inference time).\nLightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.1. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../root/.cache/torch/whisperx-vad-segmentation.bin`\nModel was trained with pyannote.audio 0.0.1, yours is 3.0.1. Bad things might happen unless you revert pyannote.audio to 0.x.\nModel was trained with torch 1.10.0+cu102, yours is 2.1.0+cu121. Bad things might happen unless you revert torch to 1.x.\nDetected language: en (1.00) in first 30s of audio...",
  "metrics": {
    "predict_time": 3.596677,
    "total_time": 3.575386
  },
  "output": {
    "segments": [
      {
        "end": 30.811,
        "text": " The little tales they tell are false. The door was barred, locked and bolted as well. Ripe pears are fit for a queen's table. A big wet stain was on the round carpet. The kite dipped and swayed but stayed aloft. The pleasant hours fly by much too soon. The room was crowded with a mild wob.",
        "start": 2.585
      },
      {
        "end": 48.592,
        "text": " The room was crowded with a wild mob. This strong arm shall shield your honor. She blushed when he gave her a white orchid. The beetle droned in the hot June sun.",
        "start": 33.029
      }
    ],
    "detected_language": "en"
  },
  "started_at": "2023-11-13T08:47:18.439127Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/h2ovig3bxqz5wwgyexzumicsam",
    "cancel": "https://api.replicate.com/v1/predictions/h2ovig3bxqz5wwgyexzumicsam/cancel"
  },
  "version": "77505c700514deed62ab3891c0011e307f905ee527458afc15de7d9e2a3034e8"
}

Generated in

3.6 seconds

Tweak it Report

This example was created by a different version, victor-upmeet/whisperx:77505c70.