This is a demo for the MARS5 English speech model (TTS) from CAMB.AI.
The model follows a two-stage AR-NAR pipeline with a distinctively novel NAR component (see more info in the Architecture).
With just 5 seconds of audio and a snippet of text, MARS5 can generate speech even for prosodically hard and diverse scenarios like sports commentary, anime and more.