Data visualization — A Picture Indeed Speaks a Thousand Words!

Digitate
10 min readApr 19, 2024

--

By Surabhi Gore

Photo by Anika Huizinga on Unsplash

They say — “a picture is worth a thousand words”. And how rightly so! Complex and sometimes multiple complex ideas can be conveyed easily and effectively with a single image. The world of data analytics is no different. In today’s age of information overload, data visualization stands strong to bridge the gap between data-driven insights and human inference and adoption.

Data visualization is broadly used for 3 main objectives — to explore, to monitor, and to explain.

  • To explore: It helps users to understand their data better, interact with it, and find patterns and relationships.
  • To monitor: It is used to monitor data in real time and look for indicators of areas that need attention.
  • To explain: It is used to bring “explainability” to analytics. It provides an effective tool to explain, “why the analytics is saying, what it is saying”.

With increasing focus on observability and AI-powered insights, data visualization has now become an important step in the data science process. Visualization has penetrated deep in our day-to-day routine such that it is almost impossible to imagine these activities without their visual spectacle. Google Maps, with its great geographic visualization, has become our go-to application for everyday commute. Netflix has created an engaging personalized system of engagement for our everyday entertainment. Amazon has conquered most of e-commerce with effective visualization of various categories of items like groceries, shoes, and name any. They work on simple formulas of giving user a simple user interface and most of the job is done only through search. That really makes the application a go-to app.

Data visualization is a creative space. When it comes to visualizing data and data-driven insights, there is no single right answer. But each new visualization experience teaches some cool lessons. Below I present some of my experiences from this data visualization journey.

Good data visualization can often substitute for complex algorithms

Data visualization plays an essential role in making analysts familiar with data and its features. It can help understand data distribution, data quality, and even insights such as trends, patterns, outliers, clusters, among others. Data visualization can be so powerful that a well-rendered chart might derive insights without even the need for sophisticated algorithms!

Consider a case of analyzing the data of transaction spends. Such data contains a record of transactions where each transaction is associated with quantity, price, vendor, buyer, item, location, date of purchase, and other attributes.

Figure 1: Visualization of transaction spend

Figure 1(a) shows a scatter plot on the axes of quantity and price. Each bubble in this plot represents a transaction and its location in the plot shows its price and quantity. This rendering itself starts showing 4 clusters of transactions. Cluster A represents transactions with low quantity high price, and Cluster B represents transactions with bulk purchase with high quantity and low price. Both these clusters represent a normal behavior representing the phenomenon that prices are high for small-quantity purchases and low for bulk purchases. But Cluster C represents anomalous transactions showing high spends yet high prices. This insight did not require any algorithm for clustering or anomaly detection. A visualization was enough to make this inference.

Next, let’s add some more elements to this visualization. Figure 1(a) plots each transaction with a same size bubble. Figure 1(b) plots this data such that the size of bubble is based on the dollar value of that transaction. This visualization completely changes the focus. Cluster C was looking at a small cluster due to a small number of transactions. However, this same cluster, now becomes a lot more relevant because of its associated dollar value.

Let’s take this visualization one level deeper. Figure 1(a) and Figure 1(b) presents each transaction with the same color. Figure 1(c) color codes each bubble based on the vendor associated with the transaction and reveals some interesting insights. It shows that almost all of the transactions in the anomalous cluster, Cluster C, are associated with only two vendors. This gives a concrete lead for further investigation into the transaction anomalies. This insight which otherwise would need a classification algorithm is derived just by creative rendering of the data.

Note that, algorithms are surely required to derive insights on a large scale of data. However, intelligent visualization itself can be a very powerful tool to uncover many insights during the early data exploration.

So far, we demonstrated that effective visualization could derive insights without needing complex algorithms. Furthermore, visualization may be able to detect insights which otherwise may go unnoticed by an algorithm, such as unusual data distributions, local patterns, implicit patterns, etc.

Let’s look at one more example. The chart in Figure 2 explains the revenue distribution across years for a list of different music formats such as CDs, album downloads, cassette, etc. It is very hard to determine about the trend in music formats by looking at the raw data. But it is evident from a basic column chart how across the years trends vary. Look how the revenue from cassettes was at its peak during the 90’s and witnessed a drastic decline with emerging trends of CDs and DVDs. (in the second image).

Figure 2: Recorded music revenues by format (Source : https://www.riaa.com/u-s-sales-database/)

When it comes to data exploration, a good data visualization can do wonders. We just need to select the right data, narrow down the right scope, and pick the best metaphor of visualization and voila!

Data visualization makes AI-driven insights explainable

Well surely, we do need AI/ML models to bring out complex insights from the data. However, when using AI for business-critical decisions, users need some way to understand why the AI is saying what it is saying. Without that explainability, users are not able to build trust in the underlying AI engine, which has a direct consequence on the AI adoption. Explainability brings another significant benefit. As users start understanding the rationale behind AI-driven insights, they are able to validate and even augment those decisions with their intuition built over years of experience. Note that a business user might not need explanation of each and every parameter or each step of an algorithm. They most often focus only on the justification behind the decision made by an AI system.

Humans, in general, can process visual information much faster and much easier as compared to textual information. Same holds true for data-driven insights. Visual aids make for a great tool to explain algorithm output. Here are some examples:

  • Trees make for one of the best representations to explain the reasoning based on classification. Figure 3(a) shows a simple decision tree explaining the root-cause analysis of high response time. It shows that the response time is observed to be high only when the thread pool utilization is high, and the active process count is also high.
  • Scatter plots make it easy to explain regression equations capturing linear or non-linear relationships between 2 variables. Figure 3(b) shows how response time of an application is changing with increase in CPU utilization. It shows that for small values of CPU utilization, the response time increases gradually with CPU utilization. But as the response time increases exponentially, CPU utilization observes higher values.
  • Creative line-charts make it easy to understand time-series forecasting. Figure 3(c) shows the CPU utilization of a server in the form of a line-chart. It shows its past behavior and intuitively explains change points in the past behavior. It also shows increasing trend and a day of week pattern in the recent past, and how that intuitively leads to forecast and a potential CPU saturation in near future.
Figure 3: (a) Classification using trees, (b) Regression using scatter plots, (c) Forecasting using line-charts

Visual tools provide an effective medium to explain the workings of many deep learning algorithms.

  • Figure 4(a) shows how feature maps help visualize the evolution of an image through several iterations of a neural network.
  • Figure 4(b) shows how an image evolves over several iterations of a GAN.
Figure 4: (a) feature maps — Source: https://towardsdatascience.com/advanced-topics-in-deep-convolutional-neural-networks-71ef1190522d, (b) image evolution through GANs — Source: https://medium.com/@igniobydigitate/gans-a-creative-adversarial-relationship-bc791f5f8225

In addition to the individual building block algorithms, visualization simplifies the explanation of even more complex product features that are built through a pipeline of several algorithms. Predicting a business process is one such example which includes algorithms such as regression, pattern mining, forecasting, anomaly detection, and fault impact propagation. An enhanced Gantt chart forms a very effective metaphor here.

Figure 5 shows the prediction of chain of business processes executions. It represents the interdependencies between the tasks through a Gantt chart. It embellishes the Gantt bars with colors, labels, and icons. It shows that Process 1 finished on time and met its deadline. Process 2 observed a failure in the first task which is likely to lead to an hour’s delay in its completion. Process 3, on the other hand, seems on track and is predicted to finish on time.

This visualization shows the inter-process dependencies, predicted behavior, anomalies, impact of anomaly, and the outcome on timely completion or delays in process executions.

Figure 5: Visualization of business process execution

Data visualization effectively tells a story with the data

Traditionally data analysis is often associated with spreadsheets, algorithms, equations, and models! However, these skills need to be complemented with “soft skills” to present the analysis in compelling manner. Data storytelling provides this skill to effectively communicate data-driven insights using well-defined personas, use case-centric narratives, and visualizations.

Data storytelling is like human storytelling but with added aspect of data-driven deep insights and visual evidence. It is now changing how we consume data and analytics.

Persona-driven narrative

Different users look at the data and insights from different perspectives. Hence, the data and insights need to be presented with a contextual narrative.

Consider an example of Uber here.

  • Figure 6(a) is a visual that most of us are familiar with as an Uber rider. It focusses on showing all the information that a rider is interested in after booking a cab ride, such as, where is the cab? What time will it arrive? How long will it take to reach the destination? etc.
  • Figure 6(b) shows a different visual of Uber rides. This visual is designed for the persona of an operations manager who is analyzing the usage of Uber rides to better manage and plan the entire ecosystem. It is the same data– what rides were taken, from and to locations of these rides, time taken by these rides, etc. This data is collected across multiple rides and is now presented using very different metaphors. The geographical maps overlayed with column charts help analysts in deriving insights such as the peak hours, high demand spots, bottlenecks, etc.
Figure 6: (a) Customer view — Source: https://www.google.com/, (b) Operations view — Source https://eng.uber.com/keplergl/

Connecting the dots

With increasing focus on observability, a wide variety of data is being collected, and equally, wide variety of insights are also being generated. However, the impact of these insights increases manyfold when we can connect the dots across these silos of observations and create a system of engagement.

Google maps (Figure 7) forms a great example to emphasize these points. It not just helps navigate from A to B, but also connects the dots across a variety of related information to offer a one-stop-destination to plan your ride. It shows information about places or things to do nearby. From the same experience, a user can book a hotel room, or restaurant table, view photographs, read user reviews, and so on. The net result is so much more effective than a humble geographical map.

Figure 7: Google Maps search — Source: https://www.google.com/maps

Addressing insight fatigue

Increasing focus on data analytics is leading to a side effect of insight fatigue. The AI engines are now generating so many diverse and complex insights that the end users are getting overwhelmed in consuming these to drive business decisions. Data storytelling’s objective is also to eliminate this insight fatigue by presenting the insights in the form of compelling story.

Consider an example of Paytm Money as shown in Figure 8. It distills the insights to give recommendations on stocks or funds based on categorical divisions such as — top gainers or losers, yearly high and lows, equity funds, hybrid funds, etc. The representation is simplistic in the form of grids or tables. However, it is made intuitive with the use of color coding, icons, and line charts, such that the users are able to easily make their selections from the list of recommendations.

Figure 8: Paytm Money visualization

Closing notes

Data visualization is an exciting space, and it is getting increasing importance with the increasing penetration of data-driven insights into the business-critical activities. This blog is the first in a series of data visualization blogs that will cover various aspects such as how to go about visualization, what are the key elements of data visualization, what are the ongoing trends, and the science behind data visualization. Stay tuned!

About the author

Surabhi Gore is the User Experience Lead at Digitate. She has been working in the space of designing and developing visual experience for cognitive solutions. Her areas of interest include visual analytics, explainable intelligence, and big-data visualizations.

--

--

Digitate

Digitate is a leading provider of SaaS-based, autonomous enterprise software, bringing agility, assurance, and resiliency to IT and business operations.