Saying hello to Predictive Analytics

Mahima Kaushiva
Nov 4 · 4 min read

Disclaimer: Not recommended for crystal ball gazers

Someone famous once said, ‘It is hard to make predictions, especially if they are about the future.’ With machine learning, however, not only is it possible to predict possible outcomes but also model one’s current behaviour as per future expected trends. Commonly referred to as predictive analytics, these models use a variety of statistical techniques to detect patterns in historical and present behaviour and determine a likely outcome with the objective of making the future more efficient / less uncertain. From fraud detection, to calculating insurance premia to studying customer behaviour and identifying future cross-selling opportunities, the commercial applications of predictive analytics are many.

So how does predictive analytics work?

Predictive analytics is essentially based on two kinds of models — parametric and non-parametric. There is a third — semi-parametric models, but these are less commonly used. For the purpose of this post, I have chosen to focus on the first two.

Parametric models :

These algorithms use a finite set of assumptions about the character of the population and input these into building a model. The future value of the model is determined only by the parameters.

A parametric model can be defined as P(x|⍬,D) = P(x|⍬)

where x = future predictions, ⍬= parameters and D = data

⍬ captures everything there is to know about the data and the actual outcome is not determined by the data. Hence the model is bounded even though the data may be unbounded

One of the easiest parametric models is built using linear regression, or thinking of the function as a line.

b0 + b1x1 + e= y

where b0, b1 are coefficients of the line and determine intercept and slope and x1 is the input variable

This model is considered to be simple because once b0 and b1(the characteristics that need to be estimated) have been determined, one can plug them into the line and the model is ready.

Some more examples of parametric models include:

  1. Logistic regression
  2. Linear discriminant analysis
  3. Perceptron
  4. Naive Bayes
  5. Simple neural networks

Non-parametric models:

These algorithms do not make strong assumptions about the form of the mapping function and are essentially free to learn any functional form from the training data.

They seek to best fit the training model while keeping some ability to generalise for unseen data and are hence able to fit a large number of functional forms. In other words, a non parametric model uses a flexible number of parameters and often the parameters grow as the model learns from more data.

Non-parametric models assume that data distribution cannot be defined in terms of a fixed set of parameters but instead by assuming an infinite dimensional ⍬. The amount of information that ⍬ can capture about data (D) can grow as the amount of data grows hence these models are much more flexible.

One of the most common forms of non-parametric modelling is the k-nearest neighbours algorithm that makes predictions based on behaviour of the elements closest to the model that have a similar output variable. Choosing the right neighbour is done by trying several Ks from the data and picking the one that works best. This method does not assume anything about the form of the mapping function other than observing the variables which have a similar output. Because it has observed more data, it can make an even better prediction about the future.

Popular methods of non-parametric modelling include:

  1. Decision trees like CART and C4.5
  2. Support vector machines

Advantages and Disadvantages

Parametric models are computationally faster but have a finite number of parameters hence keep the scope of the model rigid. Non parametric models are significantly slower especially as the size of the dataset increases. However their ability to adapt and take on additional values as the data becomes more complex makes them much more flexible.

In parametric models, it is easy to determine which model should fit the data. Whereas in non-parametric cases, the data determines which model would be the best fit. Hence parametric models are used for cases that are relatively straightforward in nature whereas non-parametric models can help predict outcomes for more complex problem sets.

Predictive Analytics in use

Most organizations use non-parametric models as these can be applied to a wide range of use cases and are not confined in scope. However, depending on the industry and type of outcome required, the model may vary. Some of the commonly used ones are :

  1. Classification model — uses historical data to provide yes or no answers

a. For a retailer — ‘Is this customer about to churn?’

b. For a loan provider — ‘Will this customer default?’

2. Clustering model — sorts data into groups using similar attributes

Most e-com companies create clusters of customers based on similar buying behaviour, product preferences or even demographic characteristics and devise strategies to target each group

3. Outliers model — uses anomalous entries within a dataset to determine cause and effect

a. Recording a spike in calls to determine if a product has deficiencies and needs to be recalled

b. Finding outlier data in insurance claims to detect fraud

4. Time-series model — uses time as the input parameter to determine furture occurences based on various metrics such as seasonality, specific events etc.

a. Number of footfalls on a website in the last month >> determine frequency of future visits

b. Number of patients visiting a surgery in the last six months >> estimate number of patients in future

Mahima Kaushiva

Written by

Brand Strategist | Data Science student. Curious about data, brands and human behaviour.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade