Netezza on Cloud (NPSaaS) integration with Tableau

Brajesh Pandey
Modern data warehouse
5 min readMay 18, 2024
Netezza + Tableau

Tableau

A visual analytics platform to help people see and understand data and to explore and manage data, and faster to discover and share insights that can change businesses and the world. For more details — refer to Tableau documentation.

NPSaaS

A high performance petabyte scale fully managed pay-as-you-go cloud data warehouse service on AWS and Azure. IBM Netezza Performance Server provides a high-performance, cloud-native data warehouse designed for scalable analytics and insights accessible in a single platform bringing decades of innovation in data governance and security, in-database analytics and ML and hybrid-columnar processing to your data in the cloud. For more details — refer to Netezza documentation.

Time Travel Feature

NPSaaS time travel feature enables database users to retrieve and analyze historical data, and reconstruct deleted data. It supports users to run query using system time with AS OF, BEFORE, BETWEEN and FROM time expression to travel the temporal table as per the use case.

Use-case: Anomaly detection in retail warehouse

In the following section, we will go over a “retail” use case where we will use Tableau to visually analyze warehouse inventory, find anomaly in ETL batch jobs and then reconstruct the data to the previous valid state using NPSaaS’s time travel feature.

To connect “Tableau” to “NPSaaS”, one would need to navigate to “Netezza connector” in Tableau desktop and input details to make connection as given below.

IBM Netezza Connector

Let’s start with the demo.

As you can see below, once connected “Retails” database, there are 2 tables listed under it.

“items” — A table to store information about items. This table don’t get updated frequently (considered a regular Netezza table).

“warehouse” — A time travel table to store items inventory. This table gets updated periodically through a batch job every day.

Data exploration: Tableau provides a great visual tool to explore the data. For example, I was able to simply drag / drop both the tables and it automatically detected a relation between the tables for join, very neat. I was also able to view the result and understand the data.

Data exploration

To make the demo simpler and easier to understand, I limited warehouse item to only “Bike” and “Helmet”. ETL batch job runs every day and updates inventory for “Bike” and “Helmet” proportionally (1:1) in the warehouse.

Day 1 — ETL first batch

Let’s look at the view of the data (in Tableau sheet) when the first batch was inserted to “warehouse” table below. As you can see, bike and helmet counts were updated proportionally. For example, both “Bike-kids” and associated “Helmet-kids” had the same quantity i.e 25.

First Batch

Day 2 — ETL second batch

After a successful first batch, inventory in the warehouse was updated with second batch the next day.

Let’s look at the data below and see if bike and helmet counts were updated proportionally for respective associated pairs.

Data looked correct. For ex, 100 items were added to “Bike-girls” since the last batch and the same quantity was added to associated “Helmet-girls” as well, it was proportional.

Second Batch

Day 3 — ETL third batch

Let’s took a look at the data after 3rd batch update to the warehouse. As you see quantity for most of the bike-helmet pairs got updated proportionally, for example, 25 quantities were added to “Bike-men” and the same for “Helmet-men” since the last batch. However, there was a discrepancy between “Bike-kids” and “Helmet-kids” pair where “Bike-kids” product had total 50 items and “Helmet-kids” had 75. That pair was not updated proportionally as analyzed by the charts below.

Third Batch — anomaly detected

After further batch analysis, it was found that there was no items received for “Bike-kids” and “Helmet-kids” pair in the warehouse for third batch, But by mistake total count for “helmet-kids” was updated incorrectly.

To correct the anomaly for “helmet-kids” item, NPSaaS time travel feature syntax “FOR SYSTEM_TIME AS OF <day 2 batch timestamp>” was used to find the previous valid value. Resolution was to update warehouse table with that previous value for “Helmet-kids”.

After correcting “Helmet-kids“, in the chart below, quantities for respective bike and helmet pairs were updated correctly and proportionally.

Third Batch — Corrected

Tableau provides a neat option to create a dashboard where you can bring all the sheets together to visualize whole story in a single view as shown below.

To Summarize

In this blog, we talked about how Tableau can easily be integrated with NPSaaS data warehouse to analyze and visualize data together. It further extended with a real life ETL use-case and how NPSaaS time travel feature can help solve the anomaly problem easily.

References

https://www.tableau.com

--

--