-101 Reasons to Visualize Data-

Yash Gupta
Data Science Simplified
8 min readOct 3, 2020

When was the last time you saw some digital art? Digital art was one major breakthrough in the way artists put their creativity out to the world. It involves clarity, depth, details, patterns and aesthetics. Now put the idea of art with data. This is precisely what Data Visualization gives its audience. From time immemorial, graphs and charts have always been a part in presenting our analytical reports to our audience.

It is a universal law which doesn’t only apply to people involved in Data Analytics/Science. It can be economists, experts from various fields, companies, presenters, students and other academicians who plot data to show us what they want us to see.

For example, when was the last time someone showed you data like this and asked you to understand what they see in it:

*Image for representational purposes only.

While these numbers are very specific, they aren’t comparative and don’t make any meaning. This is just the top 13 out of the 1000 entries that you’re supposed to make meaning of.

Consider that this is imaginary data out of a bank in the UK that operates in multiple regions and has entries about details of their customers and their balance. If you are posed with a question: What is the balance of the top 10% customers that we serve? or a simpler question like do we have more Male customers or Female?

The hard-worker might run through all the 1000 entries and keep a count of all the Male and Female entries. While this is possible, it doesn’t eliminate the possibility of Human error or the increase in the time used to get an answer. Now, we get to what a smart-worker might do.

Here’s when Data Visualization comes into the picture. What if you were to put all this in a simple plot like this? The plot doesn’t just tell us the number of entries that exist, it shows us that the no. of Male entries are more than the Female and by what proportion too.

But you already knew this, right? This easy plot is something you’d have done. Let’s take this one level ahead. Let’s say you want to find out how the balances of Female Customers in the bank are distributed at intervals of 10K. Whilst it is possible to use a software and enter formulas to get that chunk of data it will still take you quite a while. What if we visualize it?

Female Customers’ Balance Classification

The Data is now easier to understand and we see how a majority of Female customers’ are within a balance of 50k. But considering a couple of filters here and there in a software like Excel, this too would’ve been possible. Let’s go another step forward (all the way, in fact). What if we wanted to know how the customers in Scotland who have Blue Collared jobs, fare in terms of balance that’s distributed over intervals of 5K and are in the age range of 30 to 40 years.

Before we get to the visuals, think of how long would it take to filter this data directly on a spreadsheet and get your answers whereas it takes just 5 minutes if you’re well versed with a visualization software. Here’s how it looks:

You can see the data as specifically as you wanted to through this and if you observe, the color coding works as a gradient in terms of continuous variables whereas categorical variables are distinctly colored to avoid confusion.

As seen above, Regional entries are better seen on a map and Jobs are visualized in a tree-map to show the relative proportions better. Categorical variables can be visualized in anything from a donut chart, stacked bar chart, waterfall etc. to sunburst charts. For a continuous variable, anything from a line chart to a scatter plot can be used depending on the number of variables involved. They can then be given different hues (color coding) based on an additional variable.

This process helps you visualize data, identify patterns, identify outliers, make an aesthetic report to present to your peers and eliminates any possibility of Human Error (unless it exists in the Data entered).

Until this point, we’ve seen some simple bar charts, histograms and maps that are used to visualize data. But in the real world, there’s no bounds to how creative someone can be with their visualizations. They are as beautiful as any other form of Digital Art and do use a lot of designing elements and details to ensure they explain the data to us as easily and aesthetically as possible.

Some creative examples of visualizations are shown as under:

Courtesy : Datavizproject.com

Sometimes, you will have difficulty in finding a visual to suit your needs in a software. In such cases, leverage the power of custom visuals as created by people around the world as done in Power BI by Microsoft. It is a software that also allows you to create your own custom visuals using JavaScript. The details of such applications/methods are mentioned ahead in this article.

Another important aspect of making visuals is choosing the right combination of variables and the right size of the intervals. Choosing a size in which the data has to be distributed is called “Binning”. Binning your continuous variables like Age (10,19,45,77) into categorical variables like Age Groups (Child, Teenager, Adult, Senior) will make it easier for you to understand the distribution better.

The size of intervals can also make a difference in how you understand data. Let us understand how the size can impact your understanding through a series of visuals about the balance of the customers in the bank data we just plotted:

Distribution of Balance in bins (25K, 10K, 5K)

Here you see how the data is differentiated in intervals (bins) of 25K, 10K & 5K respectively. (Note: The data is in the specified bin size but since there’s lack of space on the X-axis, only some bins have been labelled. You can understand the increase in the level of details by the number of bars involved.) The data in 25K bins shows that there’s a reduction in the number of customers as the balance in the account increases. The same pattern is observed with the 10K bins where we see how significant is the drop of the number of customers with every 10K increase in the balance.

But then as we move on to the 5K intervals, we encounter something really interesting. The number of customers does not actually continually drop as the balance increases. In fact, there are 10 customers more in the 20K bin when compared to the 15K bin which contradicts out previous assumption that the number of customers drops as the balance increases.

Hence, take your time to find the right size of the bins in order to understand your data better. As the famous quote goes, “Don’t judge a book by its cover.” This will help you discover surprising details about the same data with every additional dive in the small details.

To sum it all up:

What’s the easier method of getting an overview of and understanding Data? Visualizations.

What’s the go-to to find abnormal entries/outliers or patterns in Data? Visualizations.

What is needed to give your presentations an aesthetic edge over others? Visualizations.

Visualization Fact!

5% of the time, things are retained in your brain when they’re heard in a speech, 10% of the time, when they’re read and 63% of the time, when they’re seen in pictures or visuals.

Some software(s) available to everyone in order to make beautiful visualizations of data (P.S. They’re pretty simple to use too!):

  1. Tableau / Tableau Public (as used for visualizations in this article)
  2. Power BI Desktop by Microsoft
  3. Python/R Programming Languages
  4. JavaScript (for custom Visualizations)
  5. Traceis (open source math software)
  6. Desmos (to visualize mathematical equations and formulas)
  7. Coggle (for tree-maps and mind-mapping tools)

Datasets before descriptive or predictive analytics, always go through a process called Exploratory Data Analysis (EDA) in Data Science and Data Analytics, during which multiple visualizations of the same data reveals patterns and insights present in the data. They can be unknown, surprising extremes, presence of outliers etc. which help in working on the overall analysis of the data.

Go on ahead and master visualization of data and you’ll have a strong hold and edge over understanding datasets compared to everyone else! Once you’re well versed with Data Visualization and want to take it one step ahead, try Data Storytelling, which we will demystify in our upcoming articles.

Stay tuned with us as we chart out paths on how you can get into coding and demystify other concepts related to Data Science and Coding. Thank you for reading this all the way to the end.

For more resources available for the same, Comment down below and we’ll respond!

--

--

Yash Gupta
Data Science Simplified

Lead Analyst at Lognormal Analytics and self-taught Data Scientist! Connect with me at - https://www.linkedin.com/in/yash-gupta-dss