Run time and cost

This model costs approximately $0.086 to run on Replicate, or 11 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 62 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Model Description

MusicGen is a simple and controllable model for music generation. This deployment exposes two versions of MusicGen:

Melody. A 1.5 billion parameter model that you can prompt with both text and audio
Large. A 3.5 billion parameter model that you can prompt with text

You can specify the model you want to use via the model_version parameter, which is set to 'melody' by default.

To learn more about this model, check out the repository and paper

Model Architecture and Development

MusicGen is single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn’t require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, the authors show they can predict them in parallel, thus having only 50 auto-regressive steps per second of audio. They used 20K hours of licensed music to train MusicGen. Specifically, they relied on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.

Licenses

All code in this repository is licensed under the Apache License 2.0 license.
The code in the Audiocraft repository is released under the MIT license as found in the LICENSE file.
The weights in the Audiocraft repository are released under the CC-BY-NC 4.0 license as found in the LICENSE_weights file.