Zero To Hero Machine Learning Edition — Day 1

Zero To Hero Machine Learning — Day 3

This article focuses on Gradient Descent.

ExcaliBear
3 min readNov 28, 2023

--

Disclaimer: If you haven’t read/watched day 2, I recommend you do that before reading this article.

For those of you who prefer to watch a video I made on my channel regarding my journey:

So we have our hypothesis (hθ(x) = θ₀ + θ₁x) and we have a way of measuring how well it fits into the data (cost function). Our goal is to minimize the cost function, that way we know our hypothesis makes good guesses. That’s where gradient descent comes in.

Gradient Descent

Gradient Descent is an optimization algorithm for finding a local minimum of a differentiable function or in simple terms it helps us improve our hypothesis by minimizing the cost function.

A simple way to see the value of the cost function for each pair of θ₀, θ₁ would be putting θ₀ on the x-axis, θ₁ on the y-axis and the cost function on the vertical z-axis.

Gradient Descent Graph Clear

Remember, our goal is to minimize the cost function -> get to the very bottom of the pits in our graph.

Starting from arbitrarily θ₀ and θ₁, the way to do so is by checking the slope of the landscape at your current location and using that to walk downhill from your starting point until you reach the bottom.

Gradient Descent Graph With Steps

Your step size is affected by 2 things:

  1. The slope of the landscape — the steeper the landscape is, the bigger your step.
  2. The learning rate (α) — Too big and you might overshoot the minimum; too small, and it’ll take forever.

*α is the parameter in which we multiply the partial derivative of the cost function for updating both θ₀, θ₁.

The Bad News

You might have noticed that our starting point can change the final θ₀ and θ₁ we reach at the end of the gradient descent algorithm and that’s true.

Starting from this position:

Gradient Descent Graph Different Start

We might and up here:

Gradient Descent Graph Different End

Which is obviously not: “the very bottom of the pits in our graph”.

The Good News

For linear regression, there’s only one lowest point (global minimum) on this landscape.

Gradient Descent Graph Linear Regression

Gradient Descent will get you there, doesn’t matter your starting point, assuming you don’t take steps that are too big.

Hope you enjoyed!!

I don’t make money out of my articles, I just love to share my knowledge.
Feel free to clap and/or support me on my socials below:

Link to my YouTube channel: https://www.youtube.com/@ExcaliBearCodes

Link to a video regarding this subject on my YouTube channel: https://youtu.be/I0imzb7fwqg

Link to my blog: https://excali-blog.vercel.app

Link to a post regarding this subject on my blog: https://excali-blog.vercel.app/posts/software-engineer-learning-machine-learning/3

--

--

ExcaliBear

I'm excited to share my journey and insights as a software engineer with you. Whether you're a fellow developer or not, I aim to bring you valuable content.