Data Science is… fun

Bryan Santos
3 min readMar 24, 2020
kdnuggets.com

Some people may think Data Science is boring, too technical, tedious, hard…but let me tell you — it can be immensely fun too!

I’m writing this in probably one of the darkest eras in our lifetime. Countries are being locked down and people are forced to stay indoors (and we should). I often see people complaining in social media that they have nothing to do and that they are bored out of their wits. Well, for me, I can see data science as a good pastime.

Don’t get me wrong, data science is indeed very important for society, especially at times like these. Data science is a game changer and can be so cutting-edge that can potentially change our lives for the better and for good.

I’m not denying data science’s higher, truer purpose but who’s to say we can’t have fun with it? Thankfully, data science is not limited by proprietary software and tools which could cost millions and millions of dollars.

I remember Chef Gusteau’s catchphrase in the movie Ratatouille: “Anyone can cook!”. Well, in this case, anyone can do data science.

Here are some data science applications that I found very fun:

Games of Thrones

mirror.co.uk

I love Game of Thrones. Looking at Kaggle, you can see there are tons of Game of Thrones dataset available like battles dataset where people analyzed whether having more men in your army would result to a victory.

But I was amazed when I saw this project from the Technical University of Munich (TUM).

“A Song of Ice and Data” began as a student project. Since then, it’s gained a lot of media attention, likely because it’s had some success predicting deaths. It scrapes data from the Game of Thrones Wiki and the Wiki of Ice and Fire. Then it uses statistical analysis and machine learning to find features that are common to characters who have already died and calculate the likelihood that other characters might die. These features include house, gender, and whether relatives have died. They call the resulting number the percentage likelihood of death (PLOD).

It did an amazing job in predicting some of the main characters’ death. In 2016 for example, it predicted the PLOD of Tommen Baratheon (97%), Stannis Baratheon (96%), and Petyr Baelish (91%).

Machine learning is not perfect so it predicted “false-positives” on PLOD of Daenerys Targaryen (95%) and Davos Seaworth (91%) among others.

Another interesting Game of Thrones data sciene project is “Comparing George R.R. Martin to William Shakespeare and J.R. Tolkien: Decoding ‘Game of Thrones’ by way of data science (2019)”, by Peter Vesterberg.

The project discovers that George R.R. Martin has written A LOT. The book series has a total of nearly 1.8 million words. By comparison, the complete works of Shakespeare have about 1 million words. The “Lord of the Rings” (“LOTR”) has about 500,000 words. I guess we can’t make fun of Martin as much for his sluggishness in releasing books anymore.

Well, that’s another reason why George R.R. Martin might not finish the books…

F.R.I.E.N.D.S.

Yashu Seth, a data scientist, completed a project to figure out who the “main character” really is among the six of them. Is it Gunther?

He calculated who the main character is by analyzing all transcripts of every episode from all of the 10 seasons. Using his probably complex algorithm, Seth found there is indeed one main character above the rest. He analyzed the following variables per character: total number of lines, total number of words, total number of screen appearances and total number of mentions in episode titles.

So, apparently it’s not Gunther, but it’s Ross.

“It is really close between Ross and Rachel,” Seth wrote in his conclusion. “But, Ross beats Rachel by a significant margin in the individual scene appearances. Besides, there was very little difference between them in the other parameters. Hence, I will have to give it to Ross.”

Data science can be applied in limitless ways, the only limit is our imagination.

--

--