daanelson / whisper-train-preprocessor

Dataset Preprocessing code for Whisper Fine-Tuning

  • Public
  • 32 runs
  • GitHub

Input

Output

Run time and cost

This model runs on Nvidia T4 GPU hardware.

Readme

This runs preprocessing code to generate a dataset you can use to fine-tune Whisper. Specifically, it takes as input either:

  • two tarballs - one of audio files and one of text files. The transcription for a given audio file should have the same base name - i.e audio1.mp3 corresponds to audio1.txt.

OR

  • A jsonl file (named <some_file.txt>, which contains lines like so:
...
{"audio": <URL of audio file>, "sentence": <URL of transcription>}
{"audio": <URL of audio file>, "sentence": <URL of transcription>}
...