Big Data: Correlation

Casey Lim
CISS AL Big Data
Published in
3 min readOct 13, 2020

From Chapter 4 of Big Data — A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger and Kenneth Cukier

Photo from https://www.information-age.com/causation-and-correlation-big-data-headache-123460611/
Photo from https://www.information-age.com/causation-and-correlation-big-data-headache-123460611/

Humans often form causal links to make sense of the phenomena around them, but with misunderstandings.

Without the slow and methodical process of thinking, humans search for causality to confirm existing beliefs through imaginary causalities.

However, through an important shift, big data ultimately transformed how individuals understand and explore the world.

Causality → Correlation

As such, causal relationships between variables were no longer applicable — correlation increasingly seemed to replace causality as a more important aspect of data analysis.

Correlation

Mutual relationship between two or more variables

Acquiring a deeper understanding of the relationships between various factors made a notable impact on people’s lives — that is, the provision of up-to-date information on current trends as well as accurate predictions and predilections for the future.

In fact, such a crucial concept emerged way before the present times.

Shift

In the past, following the collection of relevant data available, correlation analyses had been run with hypotheses driven by theories to validate the suitability of each chosen proxy.

Given the scarcity of data and the unaffordable cost of its collection, the analyses conducted merely gave way to linear relationships.

Clearly, knowledge progressed slowly through trial and error.

Yet, a phenomenon can now be analyzed just by establishing a useful proxy for it with no certainty, but only probability.

The optimal proxy can be easily identified, given the abundance and size of datasets along with the strength of computing power.

The development of a sophisticated computational analysis removed the need for the laborious process of selecting and examining proxies one by one.

Complex relationships, specifically non-linear ones, can also be tracked among data, boosting the capacity of current data analytics.

Companies such as FICO, Aviva, and Experian took advantage of this shift to analyze a wider range of variables, including ones that may seem irrelevant.

This brings us to one of the key components of big data analytics.

Predictive Analytics

Use of advanced analytic techniques that leverage historical data to uncover real-time insights and to predict future events

This means that events are foreseen before they occur in real life: spotting a hit song, preventing mechanical or structural failures, etc.

A prominent example would be UPS, a shipping company, that saved millions of dollars while performing preventive maintenance.

A fleet of 60,000 vehicles in the US were monitored and its individual parts were measured so as to replace them only when necessary.

Hospitals are also able to prevent human machines from malfunctioning by using a software which captures and processes patient data in real time.

With that, subtle changes in the premature babies’ condition that may signal the onset of infection can be detected 24 hours before overt symptoms appear.

Application

Correlations derived from predictive analytics are confirmed by mathematical and statistical methods, revealing potential connections between variables.

One of the exemplary real-life applications of such methods can be seen in the regular inspections and maintenance of manholes.

Con Edison, a public utility that provides the city’s electricity, made the decision of using all the data available to format the messy data collected.

Beginning with 106 predictors of a major manhole disaster, the list was eventually condensed into a few of the strongest signals for the prediction of problem spots.

As a result, out of the top 10% of manholes on the list, 44% of them ended up being involved in severe incidents.

End of Theory?

Nonetheless, big data still requires conceptual models as its fundamental ideas, employing statistical, mathematical, and sometimes even computer science theories.

Big data is, in fact, itself founded on theory.

--

--