A very colourful data centre at Google

We now live in a world where Target probably knows you’re pregnant before you’ve even told your Mum & Dad. But how exactly? Aaron Sempf gives you a breakdown of Big Data and what it could mean for your business.

The term ‘big data’ gets thrown around a lot these days. Walk into any marketing conference anywhere in the world and you’re likely to hear it used synonymously with phrases like “understand your customer better” and “know your audience better than they know themselves”. So what does big data actually mean and how can you start using it to gain an edge over your competitors?

Before we can analyse Big Data, we need to know what it is and how the analysis of Big Data today differs from the business intelligence of yesterday.

Big Data is the term used to describe any collection of data, both structured and unstructured, that is so large, with exponential growth and complexity that it is too difficult to process using traditional data processing applications and methods.

Ok, so now we know that Big Data is, well, data that is big… but we need to break this description down further in order to understand what it’s made up of and where it comes from. To do this we should look at the definition of Big Data in terms of the three Vs, as articulated by industry analyst Doug Laney (of Gartner Research), who is credited for laying down the mainstream definition of Big Data in 2001.

With an understanding of what and where Big Data comes from, we are better equipped to understand how it is used in order to “know your audience”.

With decreasing storage costs and offsite/cloud based storage, excessive data volume is no longer an issue. Now we are faced with the issues of how to determine the relevancy of data and how to analyse the large volumes of relevant data to create value.

This is where things get interesting. Above I mentioned that Big Data analysis is different to business intelligence. While business intelligence is the analysis of organisational data through data mining, querying and reporting. In essence, while this is what we want to do with Big Data, it’s the type of analysis and complexity that warrants a separation of the two.

Traditionally business intelligence uses descriptive statistics — the discipline of quantitatively describing the main features of a collection of information from data with high information density to measure objectives, detect trends etc.

Descriptive statistics aim to summarize a sample, rather than use the data to learn about the individuals that the sample of data is thought to represent.

Big Data analysis on the other hand uses inductive statistics — a process of deducing properties of an underlying distribution and applies concepts from nonlinear system identification, which is a method of measuring the mathematical model of a system, from large data sets with low information density to reveal relationships and dependencies which allow the prediction of outcomes and behaviours.

Where business intelligence analysis describes a sample, Big Data analysis infer predictions about a larger collection than the sample represented.

Wow, this all sounds very complex and confusing. Well actually it is, when looking at it at the raw data level, but that’s why we design simple activities to understand what we want to find and applications to do this analysis for us.

Take, for a simple example, the well known theory of “six degrees of separation”. We know, from practice, that any given person can contact any other given person within six hops.

It is simple to build a tool that will tell you how many hops you are away from another person.

But that’s not where the true beauty of Big Data is.

The fun comes in deriving relationships through the masses of low information density data: the unstructured data.

Take for example Target’s objective of direct-marketing to pregnant women in their second trimester. It’s not something that all pregnant women publicise. Usually, due to the public accessibility of birth records, when a baby arrives, the mother is inundated with “new child” targeted-marketing from all sorts of companies.

Target knows you’re pregnant

Target found a trend of women in their second trimester changing their buying habits and buying all sorts of new things. So it wanted to capitalise on this finding by sending specially designed ads,encouraging purchases at their stores, before other retailers even knew a baby was on its way.

Fortunately for Target, for decades they have collected vast amounts of data on every person who walks into one of its stores. Whenever possible, Target assigns a shopper with a unique code, known internally as the Guest ID. The Guest ID is used to keep tabs on everything that is bought.

If a shopper uses a credit card or a coupon, fills out a survey, mails in a refund, calls the customer helpline, opens an e-mail from Target or visits the website; where possible and through clever relational analysis, Target records the activity against the Guest ID.

On top of the data Target has gathered itself, Target, and many other retailers alike, can purchase different types of information from demographic information; such as age, ethnicity, marital status, family size, or locality; to personalised information; such as estimated income, credit accounts, job history, your brand of car, types of media you read or watch, where you attended school, the type of coffee you prefer, social profile, and what you talk about online.

But all this personal data is meaningless and unrelated until someone spends the time to analyse and derive the relationships between the many different data sources, as Target did to find their targeted customers.

The same principles can be applied to online publishing, and in many cases due social web browsing, we already have access to the types of information that retailers purchase about us, through social profiles, purchase/subscription histories, discussion topics, likes and comments. We can derive habits and interests through linking the relationships between the different sources and finding not only the context but the time and geographic relative data.

At this point, it really just becomes a matter of knowing what specific demographic we want to target, or taking it one step further and personalising the interaction with your brand through individualised interactions, such as offering purchase bundles tailored to the individual, or providing personalised information “above the fold” when browsing your website.

Aaron Sempf lives and breathes all things technical. He goes by a few titles: Technical Analyst, Mechatronics Expert, Technologist, Futurist… . You can read of his other articles here.

Design by Deepend’s Digital Designer Elliot Midson. Hit him up on Dribbble or his website for more of his stuff.