MusicGen is a simple and controllable model for music generation. This deployment exposes two versions of MusicGen:
- Melody. A 1.5 billion parameter model that you can prompt with both text and audio
- Large. A 3.5 billion parameter model that you can prompt with text
You can specify the model you want to use via the
model_version parameter, which is set to
'melody' by default.
Model Architecture and Development
MusicGen is single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn’t require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, the authors show they can predict them in parallel, thus having only 50 auto-regressive steps per second of audio. They used 20K hours of licensed music to train MusicGen. Specifically, they relied on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.
- All code in this repository is licensed under the Apache License 2.0 license.
- The code in the Audiocraft repository is released under the MIT license as found in the LICENSE file.
- The weights in the Audiocraft repository are released under the CC-BY-NC 4.0 license as found in the LICENSE_weights file.