The future of Operational Analytics

Published in

Better Data Platforms

7 min readMay 7, 2021

The Challenges with Visualization Tools

In the last few years, many enterprises have gotten exceptionally good at building large-scale data platforms that ingest and process huge volumes of structured and unstructured data and deliver information to the data consumers. The business value of all this information comes from using the data to a) generate insights for better decision-making, and b) to automate decisions by embedding models into operational systems.

We have witnessed many architectural shifts in building data platforms, one of which is the rise of self-service tools such as Tableau and Looker. These have enabled non-technical users to undertake analysis on their own, by connecting to any data source and doing data preparation on-the-fly. They have resulted in the creation of pixel-perfect dashboards with beautiful presentation of charts and tables.

As these self-service visualization tools become widely available, we’re seeing an interesting paradox emerge: all this ocean of readily available information does not lead to higher usage and better decisions.

There could be many reasons for this.

First is the sheer number of dashboards, causing information overload. We typically see enterprises having thousands of dashboards, many of them unused. Many of these have hidden data pipelines and non-standard metrics & attributes. This reduces user trust in the information that these dashboards contain.

Another important reason could be that there’s too much information fluency. Research has shown that very high levels of information fluency does not always lead to better actions, or any action. The easier the data is to read (well-designed tables and charts), the less you’ll end up acting on the information. People only understand information if they must work to get it. The secret to getting things done with data is to play with it. There needs to be a little bit of information disfluency (not too much) to get people to act on information.

This is somewhat subtle, and it doesn’t mean that you need to make it challenging for users to get the data that they want. However, you must provide an environment for users to build their own insights on top of a clean, integrated and harmonized foundational information. Today’s visualization tools are great for executive storytelling, but they are not the answer for operational users. With increasing features, non-technical users are finding that these tools are becoming quite complex for them to handle.

A Simple Tool for Operational Analytics

At LatentView, we developed such a platform and have successfully deployed it to our clients. Rather than provide access to hundreds of dashboards, we delivered a tool that helps users to specify various segments of data that they needed, using the business semantics that they understand. This was ridiculously simple to use, and the visualizations were all tabular, which they could export to any other tool of their choice for further analysis.

This ad hoc analysis platform has been a smashing hit — it has led to significant viral adoption at all levels, resulting in better usage of data, with anecdotal evidence suggesting improved decision-making and better access to information. There’s tremendous internal demand from users for making this available to other parts of the business, especially those that are drowning in too many dashboards and / or have SQL access to data lakes.

Within 6 months of deployment in a Fortune 100 company, the platform has helped them save millions of dollars every year by delivering the promise of data democratization at scale.

Basic Design Principles

We designed the platform with the following design principles in mind:

· Easy to use for the non-technical user — basic interface of a browser-based application, with logically organized tabs and intelligent defaults customized to various groups of users

· Minimalistic interface for operational users — no fancy charts, graphs, or other visualizations, but simple tables delivered through a browser interface

· Sophisticated, but minimalist interface — Extremely user friendly with many advanced features that delivers the power of analytics in the hands of users to come up with a valuable/relevant insight rather than pre-defined metrics dashboards

· Transparent data model — Single, logical view of all the necessary data for each bounded context (e.g. customer relationship management, revenue management, category management, etc.). No messy joins, partitions, etc.

· Scalable — ability to support a large and growing number of concurrent users within the enterprise ecosystem (tens of thousands), since this is built to leverage the cloud

· Fast — Low latency for bulk data analysis starting from TBs of data compared to traditional reporting platforms

· Cost-effective — No complicated licensing costs, low cost of operations and ongoing upgrades. Our plan is to keep the tool simple, with a minimal feature set

· Rapid deployment — Once the use case is finalized, product prototype can be customized and deployed in 2–3 weeks

A Quick Tour

Let’s take a quick tour of the how the operational analytics tool looks like. The functionality in the analysis tool is grouped into various tabs for ease of navigation. There is a top panel that shows the selections across tabs. Each of these tabs is configurable by the administrator (through an admin user interface), based on user roles and the domain context that the user has access to.

Dimensions

The Dimensions tab allows the user to select the key dimensions to be presented in the final report. Non-numerical data that can be viewed, broken down and compared (such as people, places, and products). These are configurable (like everything else in platform)

Metrics

Metrics are numerical values that can be displayed in the report. These values can be viewed, broken down and used in calculations (such as Sales $, Quantity Cases, GP%, Cost). If you can perform math, it is probably a metric.

Calculations

Use Basic Calculations to apply math operators (such as add or multiply) to build customized metric of our choice. Calculations and SQL operations can be applied to any metric field.

Report Sharing

The Report sharing feature allows users to create a report and share it instantly with a specific user or a user group. Strict authentication and authorization principles are applied to ensure data privacy and security as the reports are shared.

Glossary

It’s a single view for all data definitions for dimensions and metrics to the end users.

Other Features

There are few other features, especially for administrators. It allows administrators to view usage and to validate data quality based on daily loads. Below are few of the other key options available:

Save Reports: Once the report is built and output is generated, it can be saved and reloaded for later use

Filters/Thresholds: Add filters to narrow down the analysis based on the requirement or thresholds to limit the results

Download Reports: Once the query is built and submitted, the results can be exported into an Excel or CSV for further analysis. It also allows to export the SQL for the built query.

Schedule Reports: Saved reports can also be scheduled to run on a scheduled frequency like daily, weekly, or monthly. The reports will be emailed to the users based on the schedule.

How we built it

Behind the simple design of the platform lies a sophisticated architecture. This combines the strength of data lakes and data warehouse to create a clean, integrated, and scalable data foundation. The data model is denormalized and can be optimized for common consumption patterns. Moreover, the application is partitioned by bounded contexts, and everything is customized for each context.

This way Data Platform servers as DaaS (Data as a Service) in a secure and scalable fashion:

· Designed to serve different bounded contexts within the enterprise domains. This enables us to deliver data as a service to the hands of business leaders, functional analyst, salespersons, and data scientists (no surprise)

· Built as a serverless interactive querying stack using AWS components such as AWS Lambda, AWS S3, AWS Cognito, API Gateway, Simple Queuing Service, DynamoDB and Redshift

· Delivers this using a dual-mode architecture (simple queries and complex queries). If the expected run-time of a query (calculated based on historical data and query parsing) is under 30 seconds, it kicks off the simple stack. Otherwise, the complex stack

· The client’s query request crisped as a JSON (JavaScript Object Notation) is sent over a secure HTTP to the AWS Lambda with Authentication (AWS Cognito) and HTTP hand out (AWS API Gateway)

· An ANSI SQL compliant query is constructed based on the request JSON and is ready to hit the database. The query is submitted to a highly available FIFO (first-in-first-out) queue, with the SQS (simple queuing service) decoupling from the existing HTTP connection with the client

· There are intelligent consumers of the query message, which identify the right time to hit the database (Redshift). The right time is defined by the availability of Redshift and IPs in the VPC (virtual private cloud). This is where a lot of engineering magic happens

· The dispatched queries are executed in a Redshift cluster and the results are used by the client from S3 using a pre-signed URL

· While currently built on AWS, it can be also be deployed, with some modification, in any of the cloud service platform of choice like Azure & GCP

An interesting part in the architecture is that it can handle the limitation of Lambda and API timeouts in an efficient way using constant polling mechanism, thereby creating the possibility for executing long running queries.