lucataco/zeta-editing | Readme and Docs

Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

Technion - Israel Institute of Technology

Abstract

Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion on pre-trained diffusion models. The first, adopted from the image domain, allows text-based editing. The second, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody.

Note: For now use input audio wav files

Model created over 1 year ago