A foundation model for isolating any sound in audio using text, visual, or temporal prompts
Want to make some of these yourself?