(What) is Data Analytics

rorodata
rorodata
Published in
6 min readFeb 1, 2017

Opinion

In a recent data analytics conference that we attended, a senior management employee from a well-known corporate asked a question: ‘All my data is in my Oracle ERP, should I move it to a No-SQL database, so that we can do data analytics?’ This question bothered us — there is so much confusion and hype around data analytics — Business users are confused, and data analytics projects suffer as a consequence. Here is a layman’s introduction to the land of Data Analytics

Data

The basis for data analytics is, well, Data. We can think of three types of data:

  • Enterprise Data, which comes from transactions from the different IT systems e.g. ERP, CRM, POS, eCommerce systems
  • Web 2.0 Data which comes for humans interacting on the web by writing product reviews, answering questions, posting images, videos and sound bites, using social media, twitter, etc.
  • Machine Data which comes from web logs, industrial sensors, security cameras, TV Set top boxes, etc.

Evidently, data can have many different sizes, formats, volume, and uncertainty associated with it. In data science lingo, these are respectively called the 4V’s of data i.e. Volume, Variety, Velocity, and Veracity. The idea is that all this data, not just the nice order lines from ERPs but even the customer complaints and kudos on your website, the videos posted on social media, or the pressure reading from the chemical reactor’s sensors, can be ‘analysed’ using some algorithms, to understand and improve business performance.

Data needs to be stored somewhere — and that’s what databases do. Depending on the kind of data being collected, different types of databases may be required. When data is very structured (think well-formed tables) e.g. transactions like orders, shipments, etc., they can be nicely handled in SQL databases. When data is not very well structured, e.g. data with combinations of varying columns of text, images, etc. per row, a NoSQL database is a better way to handle such data. There are other specialized databases to handle specialized data sets, e.g. graph databases, time-series databases, etc. If the data is completely unstructured, then we really cannot benefit by storing it in any one type of database.

Big Data

Large complex data sets, sometimes running into several terabytes, must be processed in reasonable time and with repeatability, if actionable insights are to be obtained and turned into business value. All approaches and technologies to accomplish this come under the umbrella term Big Data. Cleaning, structuring, storage, and retrieval of information at scale has traditionally been the focus area of Big Data.

Extract, Transform, Load (ETL) Pipeline

So you have data. But data, like ore, is usually not clean or in a useful form to begin with. A lot of things must happen before the data gets into its database. ETL is the collective name given to all those activities that are involved in pulling the data from ERP/CRM/POS/ any IT systems or databases, cleaning it, normalizing, resizing it, making changes to it in certain cases, and pushing it into the intended database. Once ETL is done, you have clean data in a database.

Analytics

Now we are ready to extract value from the data. The below diagram from Gartner Research summarizes the different types of Analytics that can be employed to get value of data

Source: Gartner

Descriptive Analytics usually involves creating reports and charts to describe what happened e.g. what was the Sales over last 4 quarters, how much did manufacturing cost vary over the last 3 months, etc. Diagnostics Analytics takes this to the next step, by attempting to answer the Why question, e.g. Why was Sales over last 4 quarters trending upwards, why did manufacturing cost vary so much in the last 3 months, etc. These two types of analytics together are usually referred to by the term as Business Intelligence. Often, to get fast results when pulling business reports, data is stored in a special, fast database called a Data Warehouse.

A somewhat recent addition to the area of business intelligence is Process Intelligence or Process Mining. This involves automatically detecting business processes starting from transactions data, and identifying process issues such as delays, capacitated resources, etc.

Predictive Analytics involves detecting patterns in data, and using them to predict what will happen in the future. For example, using web traffic data on website, it can be predicted that a customer is likely to click on a particular type of ad, or given the trend in repayment of a loan, how likely is a customer to default in the next 6-months, or given historical sales data, how much do we expect to sell in the next 18 months, etc. Other interesting applications that also arise from the same basic idea of pattern recognition include face recognition, video tagging, etc. Most predictive analytics applications leverage Machine Learning techniques.

Prescriptive Analytics is about developing a course of action to move towards a desired state or goal, it answers the HOW question. E.g. How can we maximize product yield from this chemical reactor, how can we increase the yield on the financial portfolio, how can we meet all deliveries with the existing trucks, etc. Most prescriptive Analytics applications leverage optimization and simulation techniques.

The End to End View

The following shows a high level view of going from raw data to analytics. It is important to remember that while numerous technologies may be involved, the fundamental flow and approach remains the same.

What’s the Cloud?

While all technology components needed to support analytical solutions can be purchased and deployed within your four walls, they can also be leased from technology providers as simply as availing an on-demand service. The technology vendors that offer such Cloud Services (e.g. Amazon AWS and Microsoft Azure) run massive technology farms and are thus able to offer the same to their customers at significantly lower costs, and without requiring high investments and extensive upfront commitments. It gives you interesting capabilities like paying by the hour and only for what you use, scaling-up capacity several times on-demand and within minutes, and even shutting down all services when not in use. Analytics and Cloud is a winning combination, watch for a blog post from us on this topic.

Our 2 Cents

Analytics will transform businesses in the days to come. This blog was merely to give you an introduction — a language to converse with, in this new land. We believe that businesses will benefit if they ask the following three questions, in the same order as below

1. What business problem are we solving? What is our solution?

2. Connect the dots — how will we extract value from data, to build and support a business solution?

3. What pieces are needed? How will they be connected together? How well will they work?

Going back to the original question that motivated us to write this, i.e. “All my data is in my Oracle ERP, should I move it to a No-SQL database, so that we can do data analytics?”- we hope that this blog gives people in business the background to put everything related to analytics in neat mental boxes and start asking the right questions to begin with.

--

--