Implementing Convolution2D, Linear regressions and K-means clustering from scratch

2D Convolution

Simple example of algorithm logic

For running convolution we need 2 things: input matrix with shape (Batch, Channels, Height, Width) and kernels of our convolution with shape (Out Channels, Input Channels, Height, Width), for each input channel we use different kernel.

Iteratively we get part of the input matrix with the size of kernel and run by element multiplication with each kernel. Sum of element multiplication is our output value. Formula for this A*B = a1*b1 + a2*b2 + …

As we also using strides and input matrix padding we need formula for calculating output matrix shape:
output_w_h = int(((w_h — kernel_w_h + 2 * padding_w_h) / stride_w_h) + 1)

For simplicity we will use square kernels in our example

Of course running such many iteration in real life calculations is not a good idea, so this is just an example to give understanding of logic behind convolution.

Linear Regression, Ridge Regression and Lasso Regression

Difference between this is only in regularization. Linear regression does not use regularization, Ridge using L1 regularization (lambda*Sum(|Weights|)), Lasso using L2 regularization (lambda*Sum(Weights²))

Optimization process mostly done or by Least Squares method or by Gradient Descent method. I will show the second one as it can be used in many other cases.

If you don’t know about Gradient Descent optimization, then i recommend to read my previous paper:

So idea of this methods is simple — find the parameters of the line that will pass through the set of points in such way that will minimize out loss function. (For N-dimension points our line will be N-dimension plane)

Formula for the line F(x)= A*X + b
And MSE loss function loss(y_true,y_pred) = Sum((y_true-y_pred)²)/n

So our optimization function for linear regression:
E(x) = loss(y_true,y_pred) = loss(y_true,A*X+b) =

Now we need to calculate derivative of E(x) by A and b:

dE/dA = (Sum((y_true-A*X-b)²)/n) /dA
By using chain rule we set j = y_true-A*X-b and dE/dA = (dE/dj)*(dj/dA)

dE/dj = (Sum((j)²)/n)dj = 2/n * sum(y_true-A*X-b)
dj/dA = (y_true-A*X-b)/dA = -x

dE/dA = -2/n * x * sum(y_true-A*X-b)

In the same way we getting dE/db = -2/n * sum(y_true-A*X-b)

And now we can use them to calculate update of out weights A and b

For Ridge and Lasso regression we doing the same, except adding L1 and L2 regularization to our optimization forumla


This algorithm is also pretty simple.
We set number of clusters to N, now set N random centroids(centers of the clusters) and assign nearest points to them forming initial clusters. After this in iteration:
1) for each cluster finding new centroid
2) for each centroids assign points and form new cluster
Iteration stopped when SSE error are not decreasing.

SSE = Sum(euclid(centroid,cluster_points)²), so for each cluster we get sum from squared distances between centroid and cluster points.
For Dimension≤3 we use Euclidian distance

You can run this code in Google Colab




The best about Machine Learning, Computer Vision, Deep Learning, Natural language processing and other.

Recommended from Medium

Everything you need to know about Adversarial Training in NLP

A separate guide for separable convolutions

Machine Learning Text Classification Project using the Scikit-Learn Library

This Person, Cat, and Horse Does Not Exist

Real-Time Image Captioning on an Embedded System using a light-weight Deep Learning Model (Part-1)

Support Vector Machine (Supervised Machine learning algorithm)

Support vectors pictorial representation for the above explanation

RL — Imitation Learning

Plant Disease Prediction & Get Cure App Using Artificial Intelligence

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Andrey Nikishaev

Andrey Nikishaev

Machine Learning and Computer Vision Researcher. Founder LearnML.Today

More from Medium

Image Matching with Shopee

How to Classify Different Dialects of English

Art of Choosing Metrics in Supervised Models Part 2

How to tune a Random Forest model