Classification of music approachability and engagement

Run time and cost

Predictions run on CPU hardware. Predictions typically complete within 10 seconds.

This demo runs transfer learning models to estimate music approachability and engagement using effnet-discogs embeddings.

  • Approachability measures whether the music is likely to be accessible for the general public (e.g., belonging to common mainstream music genres vs. niche and experimental genres).
  • Engagement measures whether the music evokes active attention of the listener (high-engagement "lean forward" active listening vs. low-engagement "lean back" background listening).

We include three model types, providing different outcome formats: three-class and binary classification and regression with continuous values.

These classifiers were trained on in-house MTG datasets.

Source models

effnet-discogs is an EfficientNet architecture trained to predict music styles for 400 of the most popular Discogs music styles.

Transfer learning models

Our models consist of single-hidden-layer MLPs trained on the considered embeddings.


These models are part of Essentia Models made by MTG-UPF and are publicly available under CC by-nc-sa and commercial license.