Predictive Analytics

I recently completed Udacity’s Business Analyst Nano degree and, I must say, I thoroughly enjoyed it and will recommend it to anyone looking to understand how data can drive business strategy in today’s economy. Predictive business analytics is an interesting intersection of strategy and statistics that has been a distant interest of mine since my days as an engineering student and I finally had a peek into this data driven world.

For those that may be unaware, Predictive Analytics is the use of existing data as a “training set” to build models/algorithms that can be used to approximate an end result for new data. For example, if company X wants to predict the buying behaviours of newer or potential clients, they can build a model based on the buying behaviours of their existing, loyal customers and the factors they find to be good determinants of such behaviours. Another example will be the use of employee data to assess new employees. Even outside of the workplace, we use similar techniques in pre-judging strangers and acquaintances we come across in life. We use our experiences with our friends and family, whom we know very well, to predict an expected behaviour from strangers and acquaintances.

Predictions are not always 100% accurate; however, there are many ways to continuously improve the accuracy of a model. Some methods used include: adding more data points to the training set, using more significant predictor variables in building the model, and dealing with* outliers and duplicate, missing, and untrustworthy data points. Using the predictions made by their models, companies can then serve new clients better, target and attract new clients more efficiently, and save money that would have otherwise been spent non-strategically.

There is no doubt that predictive analytics offers numerous benefits to businesses; however, there is a blind spot that could be easily overlooked in the process. Depending on the business decision to be made, analysts may either use publicly available or accessible data, data available for sale, static or live databases, company records, or any combination of these to build its training set or deploy its predictive model. The danger, lurking in the blind spots of companies deploying this very useful tool, is the tendency to neglect the loyal customers and/or employees whose data was used to create the training set and model for predictions to be made.

* “dealing with” may refer to removal or imputation