Event-Driven Data Visualization

Victor Nitu
Phi Skills
Published in
9 min readApr 24, 2020
Photo by Chris Grafton on Unsplash

Data visualization is the crystal ball every industry desperately needs. Companies should not be afraid of fortune-telling, as long as it’s built on solid foundations. In our over-competitive world, being ahead in your industry is vital. Making a well-informed decision is a tedious task, but data visualization is a tool that helps you foresee trends and predict which path your business should take to keep on track.

In my previous article, Event-Driven Data Collection, we have built an event-driven infrastructure to collect data through events. In this article, we are going to use this infrastructure to intercept and visualize user interactions. You can run this infrastructure by cloning the GitHub repository victornitu/phi-architecture-example and launching it with Docker. You can find the example project of this article on GitHub in the repository called victornitu/phi-visualization-example.

Overview

This article consists of three main parts. First, we are going to take an online dummy shop and plug it into our event-driven infrastructure to fire some events and generate data. Then, we analyze this data to explore user interaction with the help of pandas¹ and matplotlib² inside a Jupyter notebook³. And finally, we are going to build an online dashboard with Svelte⁴ and Chart.js⁵ that implements the same charts we’ve built in our notebook.

The focus of this article is to show how to build an end to end pipeline from the customer to the decision-makers. This article does not elaborate on how to make great charts. On the contrary, the charts in this article are intentionally simplistic.

Integration

For our case study, we have built an online dummy shop where you can watch and buy three products: Sugar, Salt, and Pepper. The two functionalities that interest us are when a user clicks on a product to have more details, and when a user buys the content of his shopping cart.

We are going to start with the first use case. You can see here a short video of a user clicking on the product Salt to see its details.

To handle this use case, we need to add a call to the publisher to publish an event product_watched. Each product is represented by a Svelte component Product that has a function openProductDetails. This function opens a detailed view of the selected product. We are going to add our call to the publisher at the end of this function.

We add the product name in the URL and add a body with a property visitor set to the value of username if present. Then, we use our client to call the publisher to tell him to publish the event product_watched with the parameters we sent.

Now we are going to continue with the second use case. You can see here a short video of a user adding three items Salt and two items Pepper, going to the checkout and pressing on the button Buy to complete his purchase.

For this use case, we need to call the publisher to publish an event product_bought per item. The checkout is represented with a Svelte component Checkout that has a function buyCart. This function is in charge of making an order for the shopping cart and empty it when the order is sent. Like in the previous use case, we are going to add our call to the publisher at the end of this function.

We add the customer name in the URL and add a body with a property products set to a list of all the ordered products. The name of the customer is inside a Svelte store named username, and the list of products from the shopping cart is stored in another Svelte store called order. Then, we use our client to call the publisher to tell him to publish the different events product_bought for each item inside the list products.

Our online shop is now correctly connected to our infrastructure, and it collects the data by listening to the events the shop will send. We can start analyzing our generated data!

Exploration

Data can be represented with a myriad of different charts. Exploring your data inside a notebook helps to test different options and find the most appropriate representation of your data. In this second part, we use a notebook with the two famous libraries: pandas and matplotlib.

To inspect our data, we are going to call the inspector. The first thing we do is save the address of our inspector inside a variable api.

We start by analyzing the data for our products. We use pandas to read the product API and save it inside a DataFrame.

If we read the data without formatting, we get each product in a JSON format, which is not very easy to use. We are going to play with our data to get better formatting.

If we use the property orient of the method to_json and set it to records, we see we get the correct JSON format to obtain a better DataFrame.

Perfect! Now we have a DataFrame with three columns:

  1. name : the product name
  2. watched : the number of times a product has been watched
  3. bought : the number of times a product has been bought

We are starting with a bar chart to try to visualize the conversion rate of a product. In our example, the conversion rate represents the ratio between the number of times a product is watched and the number of times it is bought.

With this bar chart, we can see that the product Sugar has a weak conversion rate. Although it is the most-watched product, it’s also the least bought. On the other hand, the product Pepper is more often bought then watched, which means the customers probably buy this product regularly and are familiar with it.

We can try a pie chart to have a better comparison between products.

The product Sugar represents half of the watched products. However it represents less than one-third of the bought product. The products Salt and Pepper are almost selling the same.

We can move on to the customers. We call our inspector with the same code as our products, we only need to change the URL to call the customer API.

We have a DataFrame with 5 columns:

  1. name : the name of the customer
  2. Salt : the amount of product Salt the customer bought
  3. Pepper : the amount of product Pepper the customer bought
  4. Sugar : the amount of product Sugar the customer bought
  5. products : the total amount of products the customer bought

We can use a stacked bar chart to find out the customers that buy the most products. This chart will also tell us the proportion of bought products.

What this chart highlights, is that the two biggest customers are Victor and Morgane. We can also see that Morgane is by far the biggest consumer of the product Sugar. Being a big customer, she would be an excellent target for a campaign associated with the product Sugar.

The advantage of consuming data from the inspector is that those charts will be dynamic. Every time the notebook will be executed, the charts will be updated with the latest data. It might look challenging to draw conclusions out of a dynamic chart. But it will make sure a decision doesn't go the wrong way. Because data can very quickly evolve, a hypothesis that seemed obvious a few hours ago might actually be completely wrong after seeing the latest updates.

Visualization

A notebook is an exceptional tool for experimentation, but it can quickly become tedious to share the latest discoveries. An online dashboard with all the best charts is a great asset to communicate and help the decision-makers to make the best decisions.

First, we need to load our products and customers inside the dashboard to be able to create charts with our data.

When the dashboard starts, we use our HTTP client to call the product API and the customer API. Because it’s an asynchronous operation, we save the result inside a Promise. Then we can await the resulting data and inject it in each Svelte component containing our charts. In this example, we inject our products inside the two components ProductBar and ProductPie, and our customers inside the component CustomerBar.

The first chart we replicate is our bar chart for the products.

We create a Chart.js chart of type bar and parse our products to extract the correct data and inject it at the correct place. For the property labels, we only provide the name of the products, and for each property datasets, we get the right column data.

The result looks very similar to the notebook chart.

The second chart we replicate is our pie chart for the products.

We create a Chart.js chart of type pie. On top of extracting the correct data out of our products, we also make a few calculations. We calculate the sum, the percentages, and the ratios between the different products.

Here, we can see that we don’t have to represent precisely the same chart. Depending on the audience that will use the dashboard, it can be wise to make some slight adjustments to make a chart more visual with less text or the opposite. It’s essential to keep in mind that the audience of a dashboard is not that same as the one of a notebook.

And finally, we can replicate the stacked bar chart for the customers.

We create a Chart.js chart of type horizontalBar. This chart is very similar to the product bar chart; we just change a few parameters and the options.

The result is very similar to the chart from our notebook.

Those charts are directly linked to your event-driven infrastructure. This detail means that every time the charts are loaded or refreshed, you get the latest status. Such a dashboard can be accessed anywhere anytime. In your office, at home, on the tube. It can be an indispensable tool to provide real-time KPIs, for example. Having access to a visual representation of your data at any time can be a real asset for a business.

Additional Note

As you might have noticed, we kept it very simple so far. They are many ways to improve this example and boost outcomes. For example, because we are plugged into an event-driven infrastructure, we could directly inspect our events and visualize them. Doing so would require some more advanced techniques. We will come back to that later. Another improvement in our dashboard would be to use D3.js⁶. D3.js is a potent visualization tool. Unfortunately, it is also complicated. But I would highly recommend having a look at it. Fortunately, this subject is far from being closed!

Conclusion

Knowledge is Power⁷. One secret to increasing your knowledge: Event-Driven Data Collection and Visualization. They help you grasp all the information your data is hiding. They are many more ways to take advantage of your data, and we are going to explore them in upcoming articles. But so far, the potential is already very high!

Scientia potentia est

[1] The pandas development team (2014). pandas documentation — https://pandas.pydata.org/pandas-docs/stable/

[2] The Matplotlib development team (2018). Matplotlib: Visualization with Python — https://matplotlib.org/

[3] Project Jupyter (2020). Installing the Jupyter Software — https://jupyter.org/install.html

[4] Svelte Contributors (2020). Svelte • Cybernetically enhanced web apps — https://svelte.dev/

[5] Chart.js Contributors (2020). Chart.js | Open source HTML5 Charts for your website — https://www.chartjs.org/

[6] Mike Bostock (2019). D3.js — Data-Driven Documents — https://d3js.org/

[7] Wikipedia, the free encyclopedia (2020). Scientia potentia est — https://en.wikipedia.org/wiki/Scientia_potentia_est

--

--