annahung31 / emopia

Emotional conditioned music generation using transformer-based model.

  • Public
  • 3.1K runs
  • GitHub
  • Paper
  • License



Run time and cost

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 15 seconds. The predict time for this model varies significantly based on the inputs.


This is a demo accompanying the repository EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation. The paper has been accepted by International Society for Music Information Retrieval Conference 2021.

The piano music was generated by a Transformer-based model and it’s in MIDI format. During the training phase, the model takes a sequence of music token as input and also output a sequence of music token. We use a published piano dataset called AILabs1k7 to pre-train the model, and then use the self-collected dataset EMOPIA to finetune and condition it.

Motivated by CTRL, we prepend an emotion token to each music sequence to make the model aware of the emotion.

The objective and subjective evaluations show that the generation quality becomes more stable when the model is pre-trained with a larger dataset, and our Transformer-based model is capable of generating music with a given target emotion to a certain degree.

There is still room for improvement in the conditioning ability. So take a look at EMOPIA and create some fantastic works using it!