sakemin / musicgen-stereo-chord

Generate music in stereo, restricted to chord sequences and tempo

  • Public
  • 3.2K runs
  • L40S
  • GitHub
  • License

Input

string

Model type. Select `fine-tuned` if you trained the model into your own repository.

Default: "stereo-chord-large"

string
Shift + Return to add a new line

A description of the music you want to generate.

string
Shift + Return to add a new line

A text based chord progression condition. Single uppercase alphabet character(eg. `C`) is considered as a major chord. Chord attributes like(`maj`, `min`, `dim`, `aug`, `min6`, `maj6`, `min7`, `minmaj7`, `maj7`, `7`, `dim7`, `hdim7`, `sus2` and `sus4`) can be added to the root alphabet character after `:`.(eg. `A:min7`) Each chord token splitted by `SPACE` is allocated to a single bar. If more than one chord must be allocated to a single bar, cluster the chords adding with `,` without any `SPACE`.(eg. `C,C:7 G, E:min A:min`) You must choose either only one of `audio_chords` below or `text_chords`.

number

BPM condition for the generated output. `text_chords` will be processed based on this value. This will be appended at the end of `prompt`.

string
Shift + Return to add a new line

Time signature value for the generate output. `text_chords` will be processed based on this value. This will be appended at the end of `prompt`.

Default: "4/4"

file

An audio file that will condition the chord progression. You must choose only one among `audio_chords` or `text_chords` above.

integer
(minimum: 0)

Start time of the audio file to use for chord conditioning.

Default: 0

integer
(minimum: 0)

End time of the audio file to use for chord conditioning. If None, will default to the end of the audio clip.

integer

Duration of the generated audio in seconds.

Default: 8

boolean

If `True`, generated music will continue from `audio_chords`. If chord conditioning, this is only possible when the chord condition is given with `text_chords`. If `False`, generated music will mimic `audio_chords`'s chord.

Default: false

boolean

If `True`, the EnCodec tokens will be decoded with MultiBand Diffusion. Not compatible with stereo models.

Default: false

string

Strategy for normalizing audio.

Default: "loudness"

number
(minimum: 0.5, maximum: 2.5)

Coefficient value multiplied to multi-hot chord chroma.

Default: 1

integer

Reduces sampling to the k most likely tokens.

Default: 250

number

Reduces sampling to tokens with cumulative probability of p. When set to `0` (default), top_k sampling is used.

Default: 0

number

Controls the 'conservativeness' of the sampling process. Higher temperature means more diversity.

Default: 1

integer

Increases the influence of inputs on the output. Higher values produce lower-varience outputs that adhere more closely to inputs.

Default: 3

string

Output format for generated audio.

Default: "wav"

integer

Seed for random number generator. If `None` or `-1`, a random seed will be used.

Output

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
Generated in

Run time and cost

This model costs approximately $0.18 to run on Replicate, or 5 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 4 minutes. The predict time for this model varies significantly based on the inputs.

Readme

MusicGen Chord

MusicGen Chord is the modified version of Meta’s MusicGen Melody model, which can generate music based on audio-based chord conditions or text-based chord conditions.

Text Based Chord Conditioning

Text Chord Condition Format

  • SPACE is used as split token. Each splitted chunk is assigned to a single bar.
    • C G E:min A:min
  • When multiple chords must be assigned in a single bar, then append more chords with ,.
    • C G,G:7 E:min,E:min7 A:min
  • Chord type can be specified after :.
    • Just using a single uppercase alphabet(eg. C, E) is considered as a major chord.
    • maj, min, dim, aug, min6, maj6, min7, minmaj7, maj7, 7, dim7, hdim7, sus2 and sus4 can be appended with :.
      • eg. E:dim, B:sus2
  • ‘sharp’ and ‘flat’ can be specified with # and b.
    • eg. E#:min Db

BPM and Time Signature

  • To create chord chroma, bpm and time_sig values must be specified.
    • bpm can be a float value. (eg. 132, 60)
    • The format of time_sig is (int)/(int). (eg. 4/4, 3/4, 6/8, 7/8, 5/4)
  • bpm and time_sig values will be automatically concatenated after prompt description value, so you don’t need to specify bpm or time signature information in the description for prompt.

Audio Based Chord Conditioning

Audio Chord Conditioning Instruction

  • You can also give chord condition with audio_chords.
  • With audio_start and audio_end values, you can specify which part of the audio_chords file input will be used as chord condition.
  • The chords will be recognized from the audio_chords, using BTC model.

Additional Feature

Continuation

  • If continuation is True, then the input audio file given at audio_chords will not be used as audio chord condition. The generated music output will be continued from the given file.
  • You can also use audio_start and audio_end values to crop the input audio file.

Infinite Generation

  • You can set duration longer than 30 seconds.
  • Due to MusicGen’s limitation of generating a maximum 30-second audio in one iteration, if the specified duration exceeds 30 seconds, the model will create multiple sequences. It will utilize the latter portion of the output from the previous generation step as the audio prompt (following the same continuation method) for the subsequent generation step.

Multi-Band Diffusion

  • Multi-Band Diffusion(MBD) is used for decoding the EnCodec tokens.
  • If the tokens are decoded with MBD, than the output audio quality is better.
  • Using MBD takes more calculation time, since it has its own prediction sequence.

Fine-tuning MusicGen Chord

For the instruction of MusicGen fine-tuning, please check the blog post : Fine-tune MusicGen to generate music in any style

Dataset

Audio

  • Compressed files in formats like .zip, .tar, .gz, and .tgz are compatible for dataset uploads.
  • Single audio files with .mp3, .wav, and .flac formats can also be uploaded.
  • Audio files within the dataset must exceed 30 seconds in duration.
  • Audio Chunking : Files surpassing 30 seconds will be divided into multiple 30-second chunks.
  • Vocal Removal : If drop_vocals is set to True, the vocal tracks in the audio files will be isolated and removed.(Default : drop_vocals = True)
    • For datasets containing audio without vocals, setting drop_vocals = False reduces data preprocessing time and maintains audio file quality.

Text Description

  • If each audio file requires a distinct description, create a .txt file with a single-line description corresponding to each .mp3 or .wav file. (eg. 01_A_Man_Without_Love.mp3 and 01_A_Man_Without_Love.txt)
  • For a uniform description across all audio files, set the one_same_description argument to your desired description(str). In this case, there’s no need for individual .txt files.
  • Auto Labeling : When auto_labeling is set to True, labels such as ‘genre’, ‘mood’, ‘theme’, ‘instrumentation’, ‘key’, and ‘bpm’ will be generated and added to each audio file in the dataset(Default : auto_labeling = True)

References

Licenses