Automated Audio Captioning
Cinematic Audio Source Separation
Automatic speech recognition with word-level timestamps and speaker diarization