banner, Spotify Logo

🎶 Spotify Recommendation Playlist Based on the Emotion of Users Most Recently Played Songs - (DSI Capstone Project)

This project serves as the final capstone project for the MiSK Data Science immersive program.

Background

This project aims to recommend songs to a user based on the mood of their most recently played songs on Spotify. There are two datasets that are obtained:

To determine the mood of the song two variables for each track will be observed. The first is the valence of the track from the Spotify API audio features, the other is the lyrics of the track. These two variables are chosen specifically because choosing one or the other is usually not enough to determine the mood of the song, which can be considered specific to the user. Especially in cases where the lyrics of the song do not match the audio mood (valence) of the track. A popular example of a song like this is Take a Walk by Passion Pit, which is an upbeat happy sounding song with sad lyrics. On the flip side, a popular sad/mellow song that has hopeful/happy lyrics is Don't Panic by Coldplay.

Spotify Audio Features:

Spotify uses a series of different features to classify tracks.

Lyrics:

Lyrics are not available on Spotify, so to obtain the lyrics of the tracks mutliple methods were used.

Importing required libaries

Loading datasets

The dataset contains 12 columns that we can use for exploratory analysis. Since we have already cleaned the dataset, there is not alot of observed NaN values. Howeverm we will check to see:

We can see that there are only two columns with missing values (noting that the first 50 rows are the users recently listened) played_at is null for the spotify dataset as it does not exist so we will keep the NaN values. On the other hand, the mood is null for the users songs, as this is information we currently don't have and will be predicting.

Audio feature analysis

By observing the histograms, we can see that the audio features that have the least variance between each observation are speechiness and instrumentalness. This tells us that the majority of our observations have around the same speechiness and instrumentalness, so these are features we probably won't be able to get alot of information out of. On the hand, valence, tempo, energy, and danceability have the most variance between each observation, so these are features that we should study to differentiate between the tracks and also find similiarities.

We can see that there is the highest correlation between energy-loudness and valence-energy. While, we see there is an anti-correlation between acousticness-energy and acousticness-loudness. Both of these results are logical as energetic songs tend to be loud and have positive valence. While acoustic tracks are less likely to be loud and energetic.

If we disclude various artists, SuicideboyS is the most occuring artist in our dataset, however, City Morgue has the most energetic songs and appears 25 times in our dataset

Even though we have a high correlation between loudness and energy, we can see that the artist with the loudest songs is Little Mix who appear in the dataset 17 times. If we observe our outputs we can also conclude that Katy Perry has the highest correlation between energy and loudness, appearing second in both.

Here we're comparing the users tendency to listen to songs that have energy and danceability compared to the Spotify dataset. We can see that the songs in the Spotify dataset generally have high energy and danceability. While the user (red cross) listens to songs that are nearly higher in energy than danceability. Generally, the Spotify dataset has a tracks with higher danceability and energy, while the user dataset has an almost intermediate level of energy and danceability.

Noting that after encoding:

True to our previous deduction, speechiness has the least correlation with the mood of a song. We can also note that not one audio feature has the highest correlation with the mood of a song.

NLP on lyrics

Suprisingly, both datasets share the same most common words. This could indicate that the songs the user is listening to are very similiar to the collected data from Spotify. Another factor could be that these words are generally the most common. We can see from the following article that most of the words generated in our wordcloud match.

It's hard to conclude the sentiment of the lyrics overall by looking at their wordclouds. We will assign a sentiment to each song ranging from negative to neutral to positive. To do this we will perform sentiment analysis using popular lexicons.

These sentiments will help us identify the mood better:

We won't be looking at subjectivity thats outputted with polarity under sentiment for TextBlob, as it's not relevant to song lyrics.

Polarity is a float that lies between [-1,1], -1 indicates negative sentiment and +1 indicates positive sentiments. However, by observing the results TextBlob does not always provide the expected results. So, we'll use VaderSentiment which gives a more detailed breakdown and compare the two.

Vader performs much better at predicting negative sentiment, however it also over exaggerates the neutrals and sometimes negatives to positive sentiments. After observing and comparing, we see that Vader is better at predicting Energy songs as positive and Angry songs as negatives, which are our extremes at both ends, but TextBlob is better at prediciting Calm songs as neutral. Both perform around the same for Happy and Sad Songs. However, if we take into consideration that the mood does not always match the sentiment of lyrics, we can conclude that that it can sometimes make sense for the mood not to match the sentiment.

For example, Taylor Swifts 'Bad Blood' has a mood of Energy but the lyrics have a negative sentiment for both Vader and TextBlob and this can be because the lyrics contain words such as 'bad', 'sad', 'mad' etc. which are recognized as negative sentiment words.

by observing and comparing the two lexicons, we can conclude that Vader performs better than TextBlob except in Calm songs, but thats because TextBlob tends to classify most of the songs as neutral, so it has a higher chance of classifying calm songs as neutral since most of its classifications are neutral.

So we will be looking at both the valence of the song as well as its sentiment to recommend the songs to the user. Noting that valence is defined by spotify as a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

So, by looking at both the audio feature and sentiment we can better determine the mood of the user's most recently listened to songs. Using our labelled data from the Spotify dataset, we can better predict the mood of the user's recently played songs.

Predict value of moods for user

We will impute the NaN values in the mood column of the users dataset by comparing the audio feature values and sentiment of the track with those similiar to it that are already labelled in the spotify dataset. The reason we chose to use imputation instead of a machine learning algorithm, is that imputers have become powerful and they are easier to use serving our function better.

The KNN imputer is a multi-variate imputer, which essentially means it can take multiple features when imputing the missing values. However, the imputer can only work on continuous variables, so we will first map our mood column to int values as an identifier for each mood. The idea in kNN methods is to identify ‘k’ samples in the dataset that are similar or close in the space. Then we use these ‘k’ samples to estimate the value of the missing data points. Each sample’s missing values are imputed using the mean value of the ‘k’-neighbors found in the dataset.[1]

We notice that some of the imputed values are floats between two integers. This means that the particular track belongs in more than mood. However, for our application we consider a track to only have one mood, so we will change the type of the result to int. Now, we will decode our mood and give them back their labels.

Just by observing the results, we can conclude that the imputer performed very well. Taking the song Rhinestone Eyes by Gorillaz as an example from the users dataset. The song has been identified to have positive lyrics (although I would personally categorize it as sentimentally negative) but the audio features ultimateley made the imputer decide to classify it under Angry, which is where I also would personally categorize this song too.

Recommending Songs to the User

Now that we have labelled the 50 most recently listened to songs, we can observe what mood is the most occuring for the user and then recommend similiar songs that fall under that mood. We will be using Content-based filtering as this method uses only information about the description and attributes of the items users has previously consumed to model user's preferences. In other words, these algorithms try to recommend items that are similar to those that a user liked in the past (or is examining in the present). In particular, various candidate items are compared with items previously rated by the user and the best-matching items are recommended. [1]

How do we use a matrix for a recommendation? The answer is similarity. We now need to calculate the similarity of one lyric to another, as well as, one feature to another. How do we do that? We can use different metrics such as cosine similarity, or Euclidean distance. For our song recommendation system, we are going to use cosine similarity and particularly, its implementation from Scikit-learn.

The recommendations made for each given user input are accurate, besides a few tracks that do not match the criterias. However, generally, the model can recommend songs depending on the mood of the users most recently listened to artist, track, or general mood of all the recently played songs.

Conclusion

By using the Spotify API, we were able to scrape data from both the user's spotify recently played songs, as well as, playlists created by Spotify and other Spotify users. This resulted in our two main datasets. The Spotify API was also used to obtain the Audio Features of each track, which helps give us more information on the track and ultimately allows us to predict the mood of each track. Additionally, we scraped lyrics for each track to further help us determine the moods of the tracks. This was done by using the Genius API, BeautifulSoup4 library, and the lyrics_extractor library. After, we obtained our data, we did some data cleaning and preparation, which included removing rows with NaN values in the lyrics or normalizing the loudness in the audio features. This stage also included performing some text pre-processing on the lyrics to make them easier to implement NLP on.

We done some EDA to help us understand our datasets better and be able to move on to the next steps. NLP was performed on the lyrics of the songs to determine the sentiment of the song, which would help in determing the mood of the song. Althought the used NLP libaries (TextBlob and Vader) did not perform as well as expected, it was concluded that in some cases, it is fine for there to be a disparity between the mood of the lyrics and the mood of the song.

With all our data ready and cleaned for giving our user some recommendations, there was one step before we could do that. By using KNN imputation, we were able to assign a mood to the users songs. Since KNN imputation is multi-variate, we were able to use our audio features and our lyrics sentiment to see which other tracks had the closest distance between all these features and thus predict the mood of the songs. Although, before resorting to imputation multiple machine learning models were used, but none of them performed well enough (not being able to go even above an accuracy of 0.5). Then, some deep learning models were also used as seen in this article. However, it too did not perform any better than the machine learning algorithms. So, in the end, imputation was the best choice for its ease of use and it also labelled the songs accurately.

Finally, we were able to determine which mood was the most prevalent amongst the user's most recently listened to songs and now recommend some similiar songs from our dataset. We also observed their most listened to artist and recommended some similiar artists and their most listened to song and recommended some similiar songs. The recommeder system used utilized content-based filtering, as we have the users personal data and preferences. We used both the tfidf vectorizer and count vectorizer which are ways to convert a given set of strings into a frequency representation that make it easier to find similiarities[1]. Although, it mostly gave similiar songs, it also sometimes recommended songs that did not match the criteria.

In conclusion, the results of the project give us a recommender system that can impute the mood of a users recently played songs and then recommend songs that sound similiar and have a similiar mood.