The man who thinks he can and the man who thinks he can’t are both right.

Darren Lim
7 min readApr 10, 2020

--

What a ride!

Over 10 vizzes, hundreds of slack messages and plenty of time spent reading and thinking, it has been an incredible experience over the past three months!

In all honesty, I would admit that at the start, I had doubts about gaining value from a foundational class. I was rather sure of myself, having amassed experiences from multiple classes. Having dealt with Python extensive in classes, being a Tableau Ambassador and even having taken an inquiry under Dr Charles — Developing Meaningful Indicators. I have had significant experience on working with data and drawing insights from data, deriving insights and trends. All of this made me think that the class would have been rather straightforward to me!

I couldn’t have been more wrong!

Our first dataset — World Happiness

I found my ability to work with datasets challenged from the first step, having to question the validity and reliability of the data. I learnt about the value of questioning each and every variable — what do they mean, how it was collected and even how reliable can the collection process be?

Fig 1. Dataset on World Happiness — tough!

From the start, we dived deeply into a difficult dataset — the one on world happiness report. We examined the dataset, looked at how it was collected and what it meant. To be absolutely frank, this was one of the toughest datasets I have ever worked with. Here’s why

  1. Regression coefficients — In a nutshell, the numbers refer to the impact of what happens when the independent variable is changed by one unit. This was the first time I had the opportunity to work with a dataset entirely made up of regression coefficients
  2. Dystopia Index — Dystopia is an imaginary country that has the world’s least-happy people. The purpose in establishing Dystopia is to have a benchmark against which all countries can be favorably compared (no country performs more poorly than Dystopia) in terms of each of the six key variables, thus allowing each sub-bar to be of positive (or zero, in six instances) width.
  3. Variables — Each variable meant something. Some were rather intuitive, such as the economy. However, things like ‘family’ and ‘trust’ were much more complex and subjective — one needed to dig deep to find what it means.

On the surface, it seemed like we had a really hard time with the dataset and didn’t manage to have much to show for it. However, I would argue that starting out hard was the right choice — it equipped us with a number of skills that we would not have if we had started with a simple (and clean) dataset. The skills I picked up were:

  1. Questioning the variables. Always think about what each column means and how the data was collected. For example: Who were the samples? How large was the sample size? How was the data collected? When we first looked at the dataset, Many of us thought the values represented scores for each country.
  2. Diving deep! What does “Freedom” mean?The level of freedom of speech? An index developed by the world press organization or something? Research and diving deep is super important.
  3. Calling your shots. As students of USP, I am sure we all have hypotheses about the world and opinions, created from the consumption of news articles, books and classes even. Working on this dataset taught me to critically think about the data and make educated guesses about what it means before actually analyzing it. In the long run, doing this educates us about correct (or incorrect) assumptions we make.

Round 2 — Joining datasets

Data Cleaning

80% of a data scientist’s time is spent cleaning data”

Next up! On the eve of Chinese New Year, I spent some time working on the joined data (fertility and happiness).

The issue with the join was that for many countries, they had different naming conventions. (eg. North Korea, South Korea and a bunch of others.) I had to write some code to join the data together. Data cleaning was never something I really had the opportunity to practice in Business Analytics. Hence, this was a good experience to apply and grow my #technical skill!

Code for data cleaning!

This process gave me the opportunity to build my #technical skills, growing my ability to clean data in Python! Specifically, I learnt about the melt function in pandas and it works wonders!

My attempt at building a product — Titanic Dashboard

A Mistake

My first official product. After chinese new year, I had the opportunity to work my #analytical capability! Ended up building a dashboard after looking at a medium article in class. At first, I felt pretty satisfied, had the opportunity to work my Tableau skills and grow it into an interactive viz!

However, upon closer inspection, it was clear I had made a huge mistake!

I had applied a bunch of charts to the dataset, going through a bunch of steps that I had picked up, without thinking what was the purpose behind each of them. I just applied my technical capability, a code monkey if you will, without actually thinking through the purpose and rationale behind why I did what I did.

Titanic Dashboard pic 1

Let’s briefly critique this particular product I did:

The titanic dashboard. (https://public.tableau.com/profile/darren.lim3739#!/vizhome/TypesofVisuals/Dashboard1)

  1. What is the point of the data? There are no clear hypotheses nor problem statements.
  2. The insights seem fairly scattered. There was no specific focus?
  3. Certain data plots didn’t make sense. I should have excluded those that did not make sense, lest they confuse the reader.

Looking back — it was clear that I was overly fixated on the product. I was focused on applying a list of technical skills on the dataset (ie. barplot, scatterplot, line chart and histogram) as opposed to actually questioning the data and trying to understand it.

Chatting with Dr Charles

After that, I had an opportunity to chat with Dr Charles and gain feedback on my work and what I could do to add value and improve myself.

I decided to focus on three things exclusive from then on

  1. Asking questions — the visualization shouldn’t be the end. Instead it should be the start — question the viz, the trend, why did it spike, why did it decline and investigate it with our best friend, Google.
  2. Iteration — Instead of building a huge polished project, where I can get one round of feedback, focus on launching multiple small ideas, iterating upon them and constantly improving.
  3. Technical — take the opportunity to build a section of technical that I’ve never had the chance to. In this case, spend time working on theory concepts in datacamp.com!

A viz a day

This change in mindset resulted in “A viz a day”, which was my way of pushing myself and building consistency.

There were two key reasons for me to attempt this, even though it was to be somewhat taxing amidst the panic as multiple classes switched to home-based learning.

  • Quantity. As Dr Charles mentioned, quantity is very important in order to build one’s cognitive capability. By forcing myself to indulge in many different datasets from a wide spectrum, I would be forcing myself to adapt and have to absorb knowledge from a variety of areas.
  • Consistency. I wanted to develop my ability to stay committed to something. Amidst the trouble and workload of my other modules, could I grind it out? Could I force myself to stay committed?

Looking back, I think I grew significantly not just in terms of being able to explore a dataset but rather, an ability to question a dataset AFTER it has been explored.

I started with ending my analysis process with the visualization. However, at the end of #viz-a-day, finishing the visualisation was just the end of the beginning.

I wish there was a way for me to capture the ideas and learnings I had, beyond the visuals. With that said, I decided to reattempt at what I’ve done!

Last but not least — Redoing the titanic

Armed with my experience from A Viz A day, I took another crack at the titanic. Take a look here. I don’t think you would be disappointed!

PS. There’s an awesome tidbit of information at the end!

Thank you for reading!

--

--

Darren Lim

Writing about data and business — Business Analytics in NUS Computing | University’s Scholars’ Programme.