haoheliu/audio-ldm | Run with an API on Replicate

Readme

Text-to-audio with latent diffusion

Model description

AudioLDM generates text-conditional sound effects, human speech, and music. It enables zero-shot text-guided audio style-transfer, inpainting, and super-resolution.

GitHub Demos and Project Page GitHub Repo for code

Tricks for Enhancing the Quality of Your Generated Audio

Try to use more adjectives to describe your sound. For example: “A man is speaking clearly and slowly in a large room” is better than “A man is speaking”. This can help ensure AudioLDM understands what you want.
Try using different random seeds, which can sometimes affect the generation quality.
It’s better to use general terms like ‘man’ or ‘woman’ instead of specific names for individuals or abstract objects that humans may not be familiar with.

Model Authors

Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumley

Model created over 1 year ago

Run time and cost

Readme

Model description

Tricks for Enhancing the Quality of Your Generated Audio

Model Authors