Input

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

audio

*file

script

*string

Shift + Return to add a new line

The whole city burned to the ground in a matter of hours!The whole city burned to the ground in a matter of hours!

Run this model in Node.js with one line of code:

npx create-replicate --model=quinten-kamphuis/forced-alignment

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run quinten-kamphuis/forced-alignment using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "quinten-kamphuis/forced-alignment:566a5a9530375ba0428344b66027520e83f832527bc04c5c4770cea1d3e6fcc7",
  {
    input: {
      audio: "https://replicate.delivery/pbxt/Lv9UNQISI5TvC443clsryKIEOD3LLgHqh8rsNcnKokVSZGV9/audio.mp3",
      script: "The whole city burned to the ground in a matter of hours!"
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run quinten-kamphuis/forced-alignment using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "quinten-kamphuis/forced-alignment:566a5a9530375ba0428344b66027520e83f832527bc04c5c4770cea1d3e6fcc7",
    input={
        "audio": "https://replicate.delivery/pbxt/Lv9UNQISI5TvC443clsryKIEOD3LLgHqh8rsNcnKokVSZGV9/audio.mp3",
        "script": "The whole city burned to the ground in a matter of hours!"
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run quinten-kamphuis/forced-alignment using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "quinten-kamphuis/forced-alignment:566a5a9530375ba0428344b66027520e83f832527bc04c5c4770cea1d3e6fcc7",
    "input": {
      "audio": "https://replicate.delivery/pbxt/Lv9UNQISI5TvC443clsryKIEOD3LLgHqh8rsNcnKokVSZGV9/audio.mp3",
      "script": "The whole city burned to the ground in a matter of hours!"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{ "end": 0.16045197740112993, "word": "The", "start": 0.08022598870056497 }

{ "end": 0.4011299435028248, "word": "whole", "start": 0.2807909604519774 }

{ "end": 0.6418079096045197, "word": "city", "start": 0.501412429378531 }

{ "end": 0.8223163841807909, "word": "burned", "start": 0.7019774011299434 }

{ "end": 1.083050847457627, "word": "to", "start": 0.9025423728813559 }

{ "end": 1.243502824858757, "word": "the", "start": 1.1231638418079095 }

{ "end": 1.5644067796610168, "word": "ground", "start": 1.4440677966101694 }

{ "end": 1.6847457627118643, "word": "in", "start": 1.5644067796610168 }

{ "end": 1.9655367231638416, "word": "a", "start": 1.7248587570621468 }

{ "end": 2.166101694915254, "word": "matter", "start": 1.9855932203389828 }

{ "end": 2.3466101694915253, "word": "of", "start": 2.1861581920903954 }

{ "end": 2.527118644067796, "word": "hours!", "start": 2.406779661016949 }

{
  "completed_at": "2024-11-05T18:59:22.394078Z",
  "created_at": "2024-11-05T18:59:22.187000Z",
  "data_removed": false,
  "error": null,
  "id": "y7cedt8zsdrg80cjzpsag6e80w",
  "input": {
    "audio": "https://replicate.delivery/pbxt/Lv9UNQISI5TvC443clsryKIEOD3LLgHqh8rsNcnKokVSZGV9/audio.mp3",
    "script": "The whole city burned to the ground in a matter of hours!"
  },
  "logs": null,
  "metrics": {
    "predict_time": 0.191472425,
    "total_time": 0.207078
  },
  "output": [
    {
      "end": 0.16045197740112993,
      "word": "The",
      "start": 0.08022598870056497
    },
    {
      "end": 0.4011299435028248,
      "word": "whole",
      "start": 0.2807909604519774
    },
    {
      "end": 0.6418079096045197,
      "word": "city",
      "start": 0.501412429378531
    },
    {
      "end": 0.8223163841807909,
      "word": "burned",
      "start": 0.7019774011299434
    },
    {
      "end": 1.083050847457627,
      "word": "to",
      "start": 0.9025423728813559
    },
    {
      "end": 1.243502824858757,
      "word": "the",
      "start": 1.1231638418079095
    },
    {
      "end": 1.5644067796610168,
      "word": "ground",
      "start": 1.4440677966101694
    },
    {
      "end": 1.6847457627118643,
      "word": "in",
      "start": 1.5644067796610168
    },
    {
      "end": 1.9655367231638416,
      "word": "a",
      "start": 1.7248587570621468
    },
    {
      "end": 2.166101694915254,
      "word": "matter",
      "start": 1.9855932203389828
    },
    {
      "end": 2.3466101694915253,
      "word": "of",
      "start": 2.1861581920903954
    },
    {
      "end": 2.527118644067796,
      "word": "hours!",
      "start": 2.406779661016949
    }
  ],
  "started_at": "2024-11-05T18:59:22.202605Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/y7cedt8zsdrg80cjzpsag6e80w",
    "cancel": "https://api.replicate.com/v1/predictions/y7cedt8zsdrg80cjzpsag6e80w/cancel"
  },
  "version": "566a5a9530375ba0428344b66027520e83f832527bc04c5c4770cea1d3e6fcc7"
}

Generated in

0.2 seconds

Tweak it Share Report View full prediction

Run time and cost

This model costs approximately $0.0017 to run on Replicate, or 588 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 8 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Forced Audio-Text Alignment

This model generates precise word-level timings from audio and text input. Feed it an audio file and its transcript, and it returns the exact timing for each word.

Example Output

[
    {
        "word": "The",
        "start": 0.0,
        "end": 0.16
    },
    {
        "word": "whole",
        "start": 0.16,
        "end": 0.32
    },
    {
        "word": "city",
        "start": 0.32,
        "end": 0.64
    }
]

Built using torchaudio’s MMS model. Supports various audio formats and includes fallback mechanisms for robust production use.

quinten-kamphuis / forced-alignment

Input

Output

Run time and cost

Readme

Forced Audio-Text Alignment

Example Output

Logs (y7cedt8zsdrg80cjzpsag6e80w)