daanelson / whisper-train-preprocessor

Dataset Preprocessing code for Whisper Fine-Tuning

  • Public
  • 33 runs
  • T4
  • GitHub

Input

file

tarball with list of audio files

file

tarball with list of transcriptions

file

jsonl file with list of {'audio':<audio_url>', 'sentence':<transcription>})

Output

Generated in

Run time and cost

This model runs on Nvidia T4 GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

This runs preprocessing code to generate a dataset you can use to fine-tune Whisper. Specifically, it takes as input either:

  • two tarballs - one of audio files and one of text files. The transcription for a given audio file should have the same base name - i.e audio1.mp3 corresponds to audio1.txt.

OR

  • A jsonl file (named <some_file.txt>, which contains lines like so:
...
{"audio": <URL of audio file>, "sentence": <URL of transcription>}
{"audio": <URL of audio file>, "sentence": <URL of transcription>}
...