meta / musicgen

Generate music from a prompt or melody

  • Public
  • 1.6M runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 41 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Model Description

MusicGen is a simple and controllable model for music generation. This deployment exposes two versions of MusicGen:

  • Melody. A 1.5 billion parameter model that you can prompt with both text and audio
  • Large. A 3.5 billion parameter model that you can prompt with text

You can specify the model you want to use via the model_version parameter, which is set to 'melody' by default.

To learn more about this model, check out the repository and paper

Model Architecture and Development

MusicGen is single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn’t require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, the authors show they can predict them in parallel, thus having only 50 auto-regressive steps per second of audio. They used 20K hours of licensed music to train MusicGen. Specifically, they relied on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.

Licenses

  • All code in this repository is licensed under the Apache License 2.0 license.
  • The code in the Audiocraft repository is released under the MIT license as found in the LICENSE file.
  • The weights in the Audiocraft repository are released under the CC-BY-NC 4.0 license as found in the LICENSE_weights file.