sakemin / musicgen-remixer

Remix the music into another styles with MusicGen Chord

  • Public
  • 5.7K runs
  • GitHub
  • License

Input

Output

Run time and cost

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 9 minutes. The predict time for this model varies significantly based on the inputs.

Readme

MusicGen Remixer

MusicGen Remixer is an app based on MusicGen Chord. Users can upload a music track with vocals, type in the text description prompt, and the app will create a new background track based on the input and then make a remixed music output.

Prediction Inputs

  • model_version: Model type. Computations take longer when using large or stereo models.
  • prompt: A description of the music you want to generate.
  • music_input: An audio file input for the remix.
  • multi_band_diffusion: If True, the EnCodec tokens will be decoded with MultiBand Diffusion. Not compatible with stereo models.
  • normalization_strategy: Strategy for normalizing audio.
  • beat_sync_threshold: When beat syncing, if the gap between generated downbeat timing and input audio downbeat timing is larger than beat_sync_threshold, consider the beats are not corresponding.
  • chroma_coefficient: Coefficient value multiplied to multi-hot chord chroma.
  • top_k: Reduces sampling to the k most likely tokens.
  • top_p: Reduces sampling to tokens with cumulative probability of p. When set to 0 (default), top_k sampling is used.
  • temperature: Controls the ‘conservativeness’ of the sampling process. Higher temperature means more diversity.
  • classifier_free_guidance: Increases the influence of inputs on the output. Higher values produce lower-varience outputs that adhere more closely to inputs.
  • output_format: str = Output format for generated audio. “wav”, “mp3”
  • seed: Seed for random number generator. If None or -1, a random seed will be used.

Warnings

If the input music has a phase with no chords, (i.e. long intro part, or only vocal break) the model will get confused, and the output quality might go bad.

References

Licenses