MusicGen Chord
MusicGen Chord is the modified version of Meta’s MusicGen Melody model, which can generate music based on audio-based chord conditions or text-based chord conditions.
Text Based Chord Conditioning
Text Chord Condition Format
SPACE
is used as split token. Each splitted chunk is assigned to a single bar.C G E:min A:min
- When multiple chords must be assigned in a single bar, then append more chords with
,
.C G,G:7 E:min,E:min7 A:min
- Chord type can be specified after
:
.- Just using a single uppercase alphabet(eg.
C
,E
) is considered as a major chord. maj
,min
,dim
,aug
,min6
,maj6
,min7
,minmaj7
,maj7
,7
,dim7
,hdim7
,sus2
andsus4
can be appended with:
.- eg.
E:dim
,B:sus2
- eg.
- Just using a single uppercase alphabet(eg.
- ‘sharp’ and ‘flat’ can be specified with
#
andb
.- eg.
E#:min
Db
- eg.
BPM and Time Signature
- To create chord chroma,
bpm
andtime_sig
values must be specified.bpm
can be a float value. (eg.132
,60
)- The format of
time_sig
is(int)/(int)
. (eg.4/4
,3/4
,6/8
,7/8
,5/4
)
bpm
andtime_sig
values will be automatically concatenated afterprompt
description value, so you don’t need to specify bpm or time signature information in the description forprompt
.
Audio Based Chord Conditioning
Audio Chord Conditioning Instruction
- You can also give chord condition with
audio_chords
. - With
audio_start
andaudio_end
values, you can specify which part of theaudio_chords
file input will be used as chord condition. - The chords will be recognized from the
audio_chords
, using BTC model.
Additional Feature
Continuation
- If
continuation
isTrue
, then the input audio file given ataudio_chords
will not be used as audio chord condition. The generated music output will be continued from the given file. - You can also use
audio_start
andaudio_end
values to crop the input audio file.
Infinite Generation
- You can set
duration
longer than 30 seconds. - Due to MusicGen’s limitation of generating a maximum 30-second audio in one iteration, if the specified duration exceeds 30 seconds, the model will create multiple sequences. It will utilize the latter portion of the output from the previous generation step as the audio prompt (following the same continuation method) for the subsequent generation step.
Multi-Band Diffusion
- Multi-Band Diffusion(MBD) is used for decoding the EnCodec tokens.
- If the tokens are decoded with MBD, than the output audio quality is better.
- Using MBD takes more calculation time, since it has its own prediction sequence.
Fine-tuning MusicGen Chord
For the instruction of MusicGen fine-tuning, please check the blog post : Fine-tune MusicGen to generate music in any style
Dataset
Audio
- Compressed files in formats like .zip, .tar, .gz, and .tgz are compatible for dataset uploads.
- Single audio files with .mp3, .wav, and .flac formats can also be uploaded.
- Audio files within the dataset must exceed 30 seconds in duration.
- Audio Chunking : Files surpassing 30 seconds will be divided into multiple 30-second chunks.
- Vocal Removal : If
drop_vocals
is set toTrue
, the vocal tracks in the audio files will be isolated and removed.(Default :drop_vocals = True
)- For datasets containing audio without vocals, setting
drop_vocals = False
reduces data preprocessing time and maintains audio file quality.
- For datasets containing audio without vocals, setting
Text Description
- If each audio file requires a distinct description, create a .txt file with a single-line description corresponding to each .mp3 or .wav file. (eg.
01_A_Man_Without_Love.mp3
and01_A_Man_Without_Love.txt
) - For a uniform description across all audio files, set the
one_same_description
argument to your desired description(str
). In this case, there’s no need for individual .txt files. - Auto Labeling : When
auto_labeling
is set toTrue
, labels such as ‘genre’, ‘mood’, ‘theme’, ‘instrumentation’, ‘key’, and ‘bpm’ will be generated and added to each audio file in the dataset(Default :auto_labeling = True
)
Train Parameters
Train Inputs
dataset_path
: Path = Input(“Path to dataset directory”,)one_same_description
: str = Input(description=”A description for all of audio data”, default=None)auto_labeling
: bool = Input(description=”Creating label data like genre, mood, theme, instrumentation, key, bpm for each track. Usingessentia-tensorflow
for music information retrieval.”, default=True)drop_vocals
: bool = Input(description=”Dropping the vocal tracks from the audio files in dataset, by separating sources with Demucs.”, default=True)lr
: float = Input(description=”Learning rate”, default=1)epochs
: int = Input(description=”Number of epochs to train for”, default=3)updates_per_epoch
: int = Input(description=”Number of iterations for one epoch”, default=100) If None, iterations per epoch will be set according to dataset/batch size. If there’s a value, then the number of iterations per epoch will be set as the value.batch_size
: int = Input(description=”Batch size”, default=16)
Default Parameters
- For 8 gpu multiprocessing,
batch_size
must be a multiple of 8. If not,batch_size
will be automatically floored to the nearest multiple of 8. - For
chord
model, maximumbatch_size
is16
with 8 x Nvidia A40 machine setting.
Example Code
import replicate
training = replicate.trainings.create(
version="sakemin/musicgen-chord:c940ab4308578237484f90f010b2b3871bf64008e95f26f4d567529ad019a3d6",
input={
"dataset_path":"https://your/data/path.zip",
"one_same_description":"description for your dataset music",
"epochs":3,
"updates_per_epoch":100,
},
destination="my-name/my-model"
)
print(training)
References
- Chord recognition from audio file is performed using BTC model, by Jonggwon Park.
- The auto-labeling feature utilizes
effnet-discogs
from MTG’sessentia
. - ‘key’ and ‘bpm’ values are obtained using
librosa
. - Vocal dropping is implemented using Meta’s
demucs
.
Licenses
- All code in this repository is licensed under the Apache License 2.0 license.
- The weights in this repository are released under the CC-BY-NC 4.0 license as found in the LICENSE_weights file.
- The code in the Audiocraft repository is released under the MIT license as found in the LICENSE file.
- The weights in the Audiocraft repository are released under the CC-BY-NC 4.0 license as found in the LICENSE_weights file.