The Role of Mathematics in Data Science

Editor — Ishmael Njie

What is Data Science?

“Data Science combines different fields of work in statistics and computation in order to interpret data for the purpose of decision making.” [1]

The term “science” insinuates that it is a field relies on systematic processes to achieve results that can be tested. The field calls on concepts drawn from Mathematics and Computer Science as the results achieved from such processes can be used for the following problems:

  • Recommending a movie for you to watch on Netflix.
  • Forecasting company profits
  • The price of a house can be predicted as it is measured against features such as: number of rooms, square footage etc.
  • Suggesting a song to add to your Spotify playlist

So how does Mathematics fit into this?

Mathematics is very important in the field of data science as concepts within mathematics aid in identifying patterns and assist in creating algorithms. The understanding of various notions of Statistics and Probability Theory are key for the implementation of such algorithms in data science. Notions include: Regression, Maximum Likelihood Estimation, the understanding of distributions (Binomial, Bernoulli, Gaussian (Normal)) and Bayes’ Theorem.

Machine Learning is a field that focuses on computers having the ability to learn/operate without being programmed to do so. The mathematical concepts noted above are key in understanding/implementing the following Machine Learning techniques:

1. Regression:

Regression is a branch of Statistics that can be used to perform predictions for a given dataset. The types of regression include: Simple Linear, Multiple Linear, Polynomial and Logistic.

I may want to find out the relationship between how long I teach a student in a day and their test scores. I also may want to find out how much my expenditure is affected by my income. We can answer these with regression.

Let us look at an example of simple linear regression. Linear regression is a technique in statistics to predict a response variable by fitting a line which would best represent the relationship between the dependent and independent variable. Assume you are given a data set (training set) that illustrates the sales of ice cream y, based on the average temperature on a given day x, across a certain time period. The method of regression learns weights w, to fit the training data the best; this can then be used to predict y.

In the process of learning weights for the regression line, the aim is to minimise the error function:

To minimise E(w), the closed form solution can be employed; in essence, to find the derivative of E(w) and solve for zero derivative. This will present us with the weights that will minimise the distance between the regression line and the training data.

As you can see from the graphic, there is a positive correlation between the average temperature and the ice cream sales on a given day. So, a high average temperature will predict a high number of unit ice cream sales.

Here, the learnt weights for the regression equation are: 13.818 and 0.2262, forming the equation: y = 13.818x + 0.2262. This can now be used to predict unit sales at a particular average temperature for a given day.

2. Classification:

Classification is a technique employed to assign categories to a collection of data in order to aid in accurate predictions and analysis. With classification algorithms, you are exposed to an existing dataset and are aware of the classes of particular instances; with this knowledge, a predictive model can then be generated to solve the following problem: For each future instance in the dataset, which class does a particular instance belong to.

Types of classification algorithms include Max Entropy, K-Nearest Neighbour and Naïve Bayes.

Max Entropy (Logistic Regression): As opposed to the regression concept mentioned above, where weights are learnt to predict continuous values, weights are learnt to predict categorical values.

K-Nearest Neighbour: New instances are compared with historical data points and classified based on how close the new instances are to the historical ones.

Naïve Bayes: Bayes’ Theorem is the backbone of the Naïve Bayes algorithm, a classification algorithm where all features being classified are independent of each other, regardless of their relationship between one another. A great example explaining the Naïve Bayes algorithm can be found here.

Applications of classification include:

  • Determining whether an email is spam or not.
  • Determining whether a given image portrays a cat or a dog
  • Categorising videos on YouTube.

In a nutshell, Data Science is used to find/identify patterns, and by having an understanding of various Mathematical notions (some of which are mentioned in this post), patterns can be portrayed in such a way that can be analysed which is paramount for creating statistical models, algorithms and processes to accurately make decisions.