[Week 12 + 13 + 14] If you can’t be happy, do a data set about it instead :)
To go on with my Billboard data set, I chose two different sets which related to it. The first contained the lyrics of the songs and the second had more on origin of artists and genre. I loaded in my data set to jupyter and started to clean it up. Here was some of the things I started on was converting the lyrics into a list of words then using stop words to filter out the words like ‘the’, ‘and’ and other stop words that would get in the way when analysing word repetitiveness.
I did however end up getting a bit stuck with NaN values since it was making my code fail. After realising the amount of NaNs there actually were and my inability to filter them out (even if I did not knowing what to do after I get rid of them since a value should be there) I decided to change my dataset yet again. This probably wasn’t the smartest thing I have done considering I have less than a week to complete the assignment but hey, why not.
I ended up changing to the data used by the World Happiness Report which was collected by the Gallup World survey along with the World Health Organisation and World Development Indicators (WDI). Upon seeing this dataset the thought came to me of how happy is the world? Has global happiness improved over the years? What is it that constitutes happiness? And so my data exploring journey began.
There was a bit to clean up with the data and I ended up doing the following things to make life easy:
1. Removed the column that wasn’t needed
2. Added a column for the rank of the country
3. Changed the titles of each column to one word for ease of use throughout the coding
4. Creating a column which takes the country value and converts it to its continent using nltk (values not contained in the library were manually changed)
While completing the assignment I came across some cool plot you could make using things like Seaborn or Plotly. The creativity enabled with this is what kept me motivated to do the assignment tbh. This was probably my favourite graph of all since it looks good while also giving valuable information about the correlation of my data:
While I did struggle with time, I think I went alright in the end, think being a key word. Oh well, assignment done! Feels good to be done!
Adios 1161.