Intro to Machine Learning Math

Edward Wang
Analytics Vidhya
Published in
5 min readSep 19, 2020

It’s way easier than you would think.

Much of the content below is based on the Intro to Deep Learning with PyTorch course by Facebook AI. If you want to learn more, take the course, or just take a look here.

Below is a graph that determines whether or not a student will be accepted into a university. Two pieces of data have been used: grades and tests each on a scale of 0–10. Applicants that have been accepted are in blue with those rejected in red.

Let’s say a student with test 7 and grade 6 wants to know if he would be accepted or not. Could you determine if he would be accepted?

The answer is YES.

Now how do you know this? Well, when you look at this graph, you likely looked for the point (7,6). Seeing that many blue dots surround it, you could assume that this student was accepted.

You may not know it but what you are doing is exactly what many Machine Learning algorithms strive to do. Look at data points, and based on previous trends and patterns, determine the label for the new one.

But what about that murky region around the center? It would definitely be harder to tell whether a student with grades and test scores of 6 gets accepted or rejected. So let’s set a clear boundary.

Finding the equation

That’s much better. Although this line isn’t perfect, it gives us a very good idea of whether a student makes the cut. We can therefore say with confidence that students on the right side of the line will be accepted, and those on the left will not be. But perhaps what’s more valuable is that the line gives us is an equation.

The equation of this line, in this case, is 2x₁ + x₂ -18 = 0.

What’s exciting is by using the linear equation, we can actually extract a mathematical formula to determine if students get accepted. This simply means the equation used to find out whether a student makes it is 2*test + Grades -18. If the resulting number is equal to or over 0, congrats on your acceptance! If not, better luck next time.

We call this line the decision boundary. A good way to remember this is the fact that this boundary separates the algorithm between two different decisions.

So let’s revisit our old question of a student with grades and tests score of 6 gets accepted or rejected. Well 2*6 + 6–18 =0. 0 is equal to 0, which is the minimum score to get accepted, meaning that this student is indeed accepted.

All linear equations will follow this format W₁x₁ + W₂x₂ + b. Complicated? Not at all. Let’s refer back to the equation 2x₁ + x₂ -18 = 0.

In this equation, W₁ is 2, the number you are multiplying is x₁ or in this case the test. The w is called the weight, and all you need to do with it is multiply x₁ by it. x, in this case, is the input for our example the scores of students on their test and grades.

The same goes for W₂, we did not have a number before x₂ in the previous equation because the weight is 1 meaning we can simply express x₂ as x₂.

Last but not least, we have b which stands for the bias. b can be both a positive or negative number, in our case, the number was -18.

Congrats on making it this far! Here’s a comic strip.

The Dimension Problem

Getting back on track, what if we had one more piece of data such as class rank? Well, instead of visualizing the data points in 2 dimensions, we would instead use 3 dimensions to visualize data. Therefore, the equation would no longer be W₁x₁ + W₂x₂ + b, instead if would be W₁x₁ + W₂x₂ + W₃x₃+ b.

Remember, this is because we have one more input (x₃) or, in our example, class rank. This doesn’t seem like a big problem right now, but if we had 4 different pieces of data or 5, 20, or even 1,000. What do we do then?

The three different data categories (grades, test, and class rank)

Well, one common solution that has been agreed on is simplifying W₁x₁ + W₂x₂ + b equation to the vector equation Wx + b. In this equation, W represents W₁ …, Wn-₁, Wn, and x represents x₁ …, xn-₁, xn. Where n is the number of dimensions. What this means is that W and x represent every single W and x in the equation Wx + Wx….. Wnxn+ b

This is extremely convenient as Wx + b is able to express an infinite amount of dimensions easily.

Another thing to note with multiple dimensions is the decision boundary. In 2D shapes, we used a one-dimensional decision boundary. In the 3D example above, we used a two-dimensional decision boundary. Notice the pattern? The decision boundary is always one dimension below the space it resides in. We call this a hyperplane.

The application of Machine Learning is literally limitless. Although understanding the math behind is complicated, it is definitely worth it. Hopefully, this article provides some insight into Machine Learning math and taught you something new!

Don’t know if I’m the first or latter.

Key Takeaways

  • The linear equation for decision boundaries can be used as an equation
  • The separation between the two categories is called a decision boundary
  • The equation for 2D decision boundaries follow the format of W₁x₁ + W₂x₂ + b
  • The equation for n dimension decision boundary follows the format W₁x₁ + W₂x₂ ….. Wnxn +b
  • W₁x₁ + W₂x₂ ….. Wnxn +b can be expressed as Wx +b
  • Decision boundaries are hyperplanes

--

--

Edward Wang
Analytics Vidhya

Artificial Intelligence, Deep learning enthusiast.