Mastering the Essentials of Machine Learning with a Vanilla Example

Yatshun Lee
The Modern Scientist
6 min readOct 9, 2023

There are many models like SVC, SVM, OLS, … lots of ways to model the data and give a prediction. But after years of digestion, I can simplify into these 3 things about ML that you must know.

First of All, a Meme About ML / AI

original comic by sandserif

A Little Background About Me

Photo generated by BlueWillow

I have been developing end-to-end ML/AI products to handle complex real-world challenges. I am experienced in constructing a library from scratch to model value-at-risk by Quantile Regression based on reputable research by Simone Manganelli and Robert Engle (2004)—a solid example of building a model: https://github.com/yatshunlee/CAViaR-Project.

Overview

In the expansive universe of data modelling, a multitude of methods exists, each with its unique parameters and construction algorithms. However, at their core, these models share common principles. This article talks about machine learning in a high-level overview. We will dive into these 3 core items:

  1. Understanding Machine Learning: What does a machine learning model consist of?
  2. Define a Loss Function: What target the model is trying to achieve?
  3. Optimization: How the model is going to achieve the target?

Understanding Machine Learning

There are lots of ML applications for different types of data: human face detection, email spam detection, Siri of iPhone for speech detection, … The common things are that each application has (i) an input, (ii) an output and (iii) a model. They share a common framework:

Famous Black Box Model — Image from Author

Essentially, computer programming is about taking some input and creating some output — thus solving a problem. What happens in between the input and output, what we could call a “black box”. — CS50x

ML Examples — Image from Author

You are actually looking for a function to represent this black box to take some input and generate an output. However, the function is with unknown parameters. Let me give a vanilla example:

Vanilla Example

Problem statement — Find the model of the jackpot (Simplified Version of Toss Coin Rainbow)

Image from Author

Assume that we don’t know the dimensions of the jackpot area. But, by our observations, we can obtain a dataset of the centre points, (x, y), of coins and the results of jackpot or not.

By our domain knowledge, we know that

  • the jackpot region is a rectangle

Therefore, we can try to model the jackpot area by 4 unknown parameters (x1, y1, x2, y2) in our very first step and guess if the coin is in the area. This is a binary classification — in the region (Yes) or not (No).

Define a Loss Function

After we have a function with a set of unknown parameters, we can try to think of what the model is going to achieve. We want to know how good the estimated parameters are.

Examples of portfolio optimization: blog post.

Follow up on the vanilla example

Simple question: is the following proposed model a good model? Does it match with the observed data?

Image from Author

Apparently, no! None of the data agrees with the model.

Another question: how about this?

Image from Author

Ummm… the first 4 rows all agree with the model. So, maybe!?

But still, it can be quite ambiguous. Do you have a measure for that? And the answer is yes we do have. And that’s why we’d like to have a loss function and define for this situation.

Set the loss function

Image from Author

To interpret the defined loss function, intuitively, if all guesses are right, the difference of each row is 0. If all the differences are 0, the loss is 0. The model with a set of proposed parameters is then possibly a correct model.

In contrast, if all the guesses are incorrect, the absolute difference of each row is 1. If all the absolute differences are 1, the loss then becomes the number of rows, which is much greater than 0.

So, we know that the loss function can measure how accurate the model is. And we want to minimize the loss by optimization.

Optimization

It’s all about…

We have our target now… How do we achieve that? In other words, how can we minimize the loss?

Optimization, in a nutshell, is to find the optimal set of parameters so that the loss is minimized. We, normally, need to include these things:

  1. initialization
  2. update the parameters
  3. terminal condition

Follow up on the problem

The feasible sets of x and y are [0, 1] and [0, 1]. Any pair of x and y, out of these ranges, is infeasible.

Assumptions:

  1. The data is clean, i.e., the labels are correct.
  2. The regions are linearly separable => Minimum loss = 0 (terminal condition)

After considering these assumptions, I designed the following optimization algorithm for this specific problem.

Image from Author

Apart from the terminal condition, parameters and updating process, we can also see there are some parameters. They are the hyperparameters:

  • Step size — how much should we update the parameter in every iteration
  • Depreciation factor — handle the situation when the step size is too large

Visualization of the Optimization Process for the Vanilla Problem

The minimum loss is reached after 79 updates. Annotation: Orange — coins in the jackpot region and blue — coins in the ‘Try Again’ region. The red rectangle is the jackpot region with the estimated parameters. — Image from Author

After all, the estimated parameters are obtained when the loss is minimized. Although it works well in the sample data, it might still overfit the data. (but not going to cover it here as the example is too vanilla and just want to give you a sense of what ML is)

You can see that ML is actually looking for a function with well-estimated parameters that is going well with the data. So, we can now use these estimated parameters to model the jackpot region. :D

Similarly, let’s talk about Linear Regression…

1. Function with unknown parameters

We assume there are linear relationships between the output and the set of inputs and zero correlation between all the features.

Image from Author

2. Loss function

Typically, we use the mean squared error to be the loss function and we want to minimize that to get the estimated parameters.

Image from Author

3. Optimization

We can directly obtain the optimal set of parameters by differentiating the loss function w.r.t. the parameters, setting the derivatives to zero and lastly solving the equalities.

Image from Author

Or, you can try to use gradient descent or other fancy algorithms.

Wrap up a bit…

Mastering the essentials of machine learning boils down to three key components:

  1. Understanding Machine Learning: Regardless of the specific application, all machine learning models share a common framework of taking input, processing it through a “black box” model, and generating an output. The challenge lies in finding the right function with unknown parameters to represent this black box.
  2. Define a Loss Function: Once you have a function with parameters, you need to define a loss function to measure how well the model’s estimated parameters align with the observed data. A lower loss indicates a more accurate model.
  3. Optimization: The final step is optimization, where you aim to find the optimal set of parameters that minimizes the loss. This involves initializing the parameters, updating them iteratively, and setting terminal conditions. You may also want to set the hyperparameters wisely.

By grasping these fundamental concepts, you can navigate the complex landscape of machine learning models and apply them effectively and adaptively to various real-world challenges. Hopefully, you enjoy reading it :D.

--

--