Earlier we saw how a connected data is beneficial in a corporate environment. Let’s see an example of how we can enable an organisation to be data-driven.
Let us consider an organisation who has a SAAS application, to which their customers subscribe to. This data is the pricing. The organisation then send an invoice out to the customer end of that month using applications like NetSuite. The organisation supports its product using a customer support team who looks at tickets raised by the subscribers and work against it. The work items are logged against a tracking application. The sales team tracks their work items against a CRM application like SalesForce for each lead they work upon and provides master data about the customers.
Linking up all these data and generating a utilisation report or a profit report are often time-consuming. Hence, we’ll build a data repository to pull in data from these different sources and build a dashboard using Data analytics tool like PowerBI.
To build a data platform, we need an application to bring all the sources and sinks together. There have been a lot of ETL tools traditionally used, but they are complex and time-consuming. In the new era, we use data flow tools to automate which forms the crux of dataOps. They allow working on streaming/batch data, cleanse and filter the data, concentrate on data security and provide scalability. Some of the tools which we use in HorizonX are Apache Nifi, StreamSets and Google Dataflow (Apache Beam). Why and how we choose these tools depends on the nature of the projects. I will explain them in a future post.
Now that a data platform is a setup, we can connect to all the above data sources. These could be connecting to HTTP interfaces of the SAAS application, could be getting a CSV export of a file or connecting to a database directly.
A sample flow would look like this:
The flow allows transformation, cleansing and delivers data into data sink. It also allows extending data (like adding a new sink or source to the same flow) on a live environment without any modification.
We can choose a wide variety of data sinks, depending on the business case. It could be any public cloud and any type of data lake storage. For this use cases, we choose Azure Blob Storage and Nifi data flow writes the CSVs into the blob storage
Now that we have a data platform enabled to transfer data into cloud storage, we can start consuming the data. Use any BI analytics tool like PowerBI, Google Data Studio, Looker, Tableau or Qlik to turn data into insights or dashboards.
A sample query view:
Account query represents the data from CRM.
Invoicing represents the invoicing data from NetSuite
Ticketing & tracking data are from a ticketing and tracking applications
Report generated from the above query:
As automated dataflow ensures real-time data feed into Azure, the report can be refreshed on a regular interval to have real-time insights.
A connected data platform enables quicker insights, more relevant reports, data-driven dashboards and secure data sharing across organisations.
At HorizonX, we transform an enterprise into a connected enterprise. Automate a business workflow or add value to a customer or realise the potential of an engineer, the uses cases are plenty.
Originally published at medium.com on September 12, 2018.