Regression of musical arousal and valence values

Run time and cost

Predictions run on CPU hardware. Predictions typically complete within 11 seconds. The predict time for this model varies significantly based on the inputs.

This demo runs a series of transfer learning regression models trained to predict musical arousal and valence values.
These classifiers were trained on a mixture of public and in-house MTG datasets.

Source models

  • MusiCNN. A musically motivated CNN with two variants trained on the Million Song Dataset and the MagnaTagATune.
  • VGGish. A large VGG variant trained on a preliminary version of the AudioSet Dataset.

Transfer learning classifiers

Our models consist of single-hidden-layer MLPs trained on the considered embeddings.


These models are part of Essentia Models made by MTG-UPF and are publicly available under CC by-nc-sa and commercial license.