sakemin / musicgen-chord

Generate music restricted to chord sequences and tempo

Public
2.7K runs
GitHub
License

Run with an API

Playground API Examples Train README Versions

If you haven’t yet trained a model on Replicate, we recommend you read one of the following guides.

Pricing

Trainings for this model run on 8x Nvidia L40S GPU hardware, which costs $0.0078 per second.

Create a training

Install the Python library:

pip install replicate

Then, run this to create a training with sakemin/musicgen-chord:c940ab43 as the base model:

import replicate

training = replicate.trainings.create(
  version="sakemin/musicgen-chord:c940ab4308578237484f90f010b2b3871bf64008e95f26f4d567529ad019a3d6",
  input={
    ...
  },
  destination=f"{username}/<destination-model-name>"
)

print(training)

curl -s -X POST \
-d '{"destination": "{username}/<destination-model-name>", "input": {...}}' \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  https://api.replicate.com/v1/models/sakemin/musicgen-chord/versions/c940ab4308578237484f90f010b2b3871bf64008e95f26f4d567529ad019a3d6/trainings

The API response will look like this:

{
  "id": "zz4ibbonubfz7carwiefibzgga",
  "version": "c940ab4308578237484f90f010b2b3871bf64008e95f26f4d567529ad019a3d6",
  "status": "starting",
  "input": {
    "data": "..."
  },
  "output": null,
  "error": null,
  "logs": null,
  "started_at": null,
  "created_at": "2023-03-28T21:47:58.566434Z",
  "completed_at": null
}

Note that before you can create a training, you’ll need to create a model and use its name as the value for the destination field.

Fine-tuning MusicGen Chord

For the instruction of MusicGen fine-tuning, please check the blog post : Fine-tune MusicGen to generate music in any style

Dataset

Audio

Compressed files in formats like .zip, .tar, .gz, and .tgz are compatible for dataset uploads.
Single audio files with .mp3, .wav, and .flac formats can also be uploaded.
Audio files within the dataset must exceed 30 seconds in duration.
Audio Chunking : Files surpassing 30 seconds will be divided into multiple 30-second chunks.
Vocal Removal : If drop_vocals is set to True, the vocal tracks in the audio files will be isolated and removed.(Default : drop_vocals = True)
- For datasets containing audio without vocals, setting drop_vocals = False reduces data preprocessing time and maintains audio file quality.

Text Description

If each audio file requires a distinct description, create a .txt file with a single-line description corresponding to each .mp3 or .wav file. (eg. 01_A_Man_Without_Love.mp3 and 01_A_Man_Without_Love.txt)
For a uniform description across all audio files, set the one_same_description argument to your desired description(str). In this case, there’s no need for individual .txt files.
Auto Labeling : When auto_labeling is set to True, labels such as ‘genre’, ‘mood’, ‘theme’, ‘instrumentation’, ‘key’, and ‘bpm’ will be generated and added to each audio file in the dataset(Default : auto_labeling = True)
- Available Tags of Auto-Labeling

Train Parameters

Train Inputs

dataset_path: Path = Input(“Path to dataset directory”,)
one_same_description: str = Input(description=”A description for all of audio data”, default=None)
auto_labeling: bool = Input(description=”Creating label data like genre, mood, theme, instrumentation, key, bpm for each track. Using essentia-tensorflow for music information retrieval.”, default=True)
drop_vocals: bool = Input(description=”Dropping the vocal tracks from the audio files in dataset, by separating sources with Demucs.”, default=True)
lr: float = Input(description=”Learning rate”, default=1)
epochs: int = Input(description=”Number of epochs to train for”, default=3)
updates_per_epoch: int = Input(description=”Number of iterations for one epoch”, default=100) If None, iterations per epoch will be set according to dataset/batch size. If there’s a value, then the number of iterations per epoch will be set as the value.
batch_size: int = Input(description=”Batch size”, default=16)

Default Parameters

For 8 gpu multiprocessing, batch_size must be a multiple of 8. If not, batch_size will be automatically floored to the nearest multiple of 8.
For chord model, maximum batch_size is 16 with 8 x Nvidia A40 machine setting.

Example Code

import replicate

training = replicate.trainings.create(
    version="sakemin/musicgen-chord:c940ab4308578237484f90f010b2b3871bf64008e95f26f4d567529ad019a3d6",
  input={
    "dataset_path":"https://your/data/path.zip",
    "one_same_description":"description for your dataset music",
    "epochs":3,
    "updates_per_epoch":100,
  },
  destination="my-name/my-model"
)

print(training)