CTC forced alignment using Meta's MMS model. Aligns a known transcript to audio and returns word-level timestamps.