Gradient descent: linear & logistic regression

Practicing DatScy
3 min readAug 10, 2022

--

Recently I have been coding ML and DL algorithms by hand to really understand how they work. An oldie but an essential algorithm is gradient descent, it is used in all neural network constructions and many machine learning algorithms like SVM use gradient descent.

Below is gradient descent on a toy problem. The following steps are used to execute gradient descent:

0. initialize w and b (pick a value)
1. compute y_hat = w*x + b
2. compute cost to see if w, b are ideal : J
3. compute partial derivatives of dJ/dw, dJ/db — they tell us how much we can change w wrt to J (the size of the ‘steps down the hill’)
4. update w and b
w = w — (dJ/dw)*learning_rate
b = b — (dJ/db)*learning_rate
5. repeat the steps until w, b do not change — this means that we are at the minimum of J

Load data

This dataset gives the size, weight, and species of fish. I use both linear and logistic regression to predict fish Weight from Height and Width. I obtained this data from Kaggle: https://www.kaggle.com/datasets/aungpyaeap/fish-market .

Subfunctions

https://gist.github.com/j622amilah/96b1a586742e0deb02a390d65ed3a9f6

Prepare the data

Plotting

Sci-kit learn: Linear Regression

Tensorflow: Linear Regression

By-hand: Linear Regression

Sci-kit learn: Logistic Regression

In the sci-kit learn formulation, y has to be binary or multi-category. So, we cannot estimate each Weight (y) point with a continuous vector, as is the case with true logistic regression.

Tensorflow: Logistic Regression

By-hand: Logistic Regression

Interestingly, my hand-made version has a cost that increases no matter how I tune the parameters. I added a clipping value on the sigmoid function to prevent infinite cost for when y_hat=1, in the compute_loss function. I have to keep tinkering with these values, the data, or the construction, and the cost should converge. This same code worked for a different dataset, the Coursera dataset for the Supervised Learning online class. Perhaps I need to scale the data…

RESULT

If we compare the mean squared error for all linear and logistic regression versions, the Sci-kit learn linear regression has the lowest mean-squared error and thus best fit for this dataset. The handmade linear regression version is second!

Happy practicing!

References

https://www.tensorflow.org/guide/keras/train_and_evaluate

--

--

Practicing DatScy

Practicing coding and Data Science. Blog brand: Use logic in a clam space, like a forest, and use reliable, clear, and informative Data Science workflows!