daanelson / whisper-train-preprocessor

Dataset Preprocessing code for Whisper Fine-Tuning

  • Public
  • 33 runs
  • GitHub

Run time and cost

This model runs on Nvidia T4 GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

This runs preprocessing code to generate a dataset you can use to fine-tune Whisper. Specifically, it takes as input either:

  • two tarballs - one of audio files and one of text files. The transcription for a given audio file should have the same base name - i.e audio1.mp3 corresponds to audio1.txt.

OR

  • A jsonl file (named <some_file.txt>, which contains lines like so:
...
{"audio": <URL of audio file>, "sentence": <URL of transcription>}
{"audio": <URL of audio file>, "sentence": <URL of transcription>}
...