ASR model, created by Nvidia, with word-level timestamps available. Supports .wav inputs, or m3u8 urls, with a start and end time (to only process a section of the m3u8).