meta / musicgen

Generate music from a prompt or melody

  • Public
  • 2.2M runs
  • GitHub
  • Paper
  • License

Model Description

MusicGen is a simple and controllable model for music generation. This deployment exposes two versions of MusicGen:

  • Melody. A 1.5 billion parameter model that you can prompt with both text and audio
  • Large. A 3.5 billion parameter model that you can prompt with text

You can specify the model you want to use via the model_version parameter, which is set to 'melody' by default.

To learn more about this model, check out the repository and paper

Model Architecture and Development

MusicGen is single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn’t require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, the authors show they can predict them in parallel, thus having only 50 auto-regressive steps per second of audio. They used 20K hours of licensed music to train MusicGen. Specifically, they relied on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.

Licenses

  • All code in this repository is licensed under the Apache License 2.0 license.
  • The code in the Audiocraft repository is released under the MIT license as found in the LICENSE file.
  • The weights in the Audiocraft repository are released under the CC-BY-NC 4.0 license as found in the LICENSE_weights file.