What Makes a Hit Song?
Predicting Spotify's Popularity score using Machine Learning
Since the inception of the music business, people have tried to discover what makes a ‘hit record’. Machine Learning has been employed to answer this question, but to date, no consensus has been reached.
The intention of this notebook is to train a machine learning model to predict a track's popularity score (a feature of the dataset). If we could achieve this successfully, we would know which elements of a song contribute to its popularity, and allow us to tailor those features to our music.
To conduct our analysis we’ll use the Spotify dataset (v2). It contains information for 586,672 tracks across 20 columns, including track name, release date, and numerical values for various audio features.
We’ll use this for the training, validation, and test sets. Finally, I’ll input data for individual songs on the streaming service, to see how close we get to their popularity score.
Here are the steps I'll follow:
1, Classify the problem
2, Download, clean, and explore the data
3, Create new features
4, Create a training / test / validation split and prepare the data for training
5, Create quick & easy baseline models
6, Pick a strategy, train a model & tune hyperparameters
7, Make predictions on a single input
8, Summary & conclusions, with future work and references
1. Classify the problem
The data we are targeting is a
popularity score between 0 - 100. It is a continuous rather than categorical value, so Regression is the right model choice. All our date is labelled, so we’ll train a supervised learning model.