# Machine Learning Basics

Linear Regression — simple linear function to predict the response variable for any predictor variable. The function is calculated by reducing the error (distance of the prediction function from the response variable samples).

Gradient Descent — calculate the distance of the function from each sample, move the function in the direction that minimizes the error. (least squares)

Logistic Regression — linear function to split the samples in two categories. We find it by minimizing the error (samples on the wrong side of the line). Again, gradient descent is used like in linear regression.

Support Vector Machine — Use gradient descent to find the best line that separates the samples. Only the samples closest to the line are used. Gradient descent tries to maximize the distance (distance is the minimum distance between the samples and the line on each side).

Neural Network — the decision is separated into a multiple decision. Each node in the middle layer will make a decision and pass it to output layer. Output layer combines all the decisions.

Kernel Method — When the samples can’t be split by a simple function, we can try to think of a function z=f(x,y) so that a simple plane will split the samples by z values.

K-means clustering — for each cluster we start with a candidate at a random point. We associate the samples to their closest candidate, and then move the candidate to the center of its cluster. Repeat until the distribution of candidates between clusters is optimal.

Hierarchical clustering — Iteratively cluster samples with their closest neighbours. Repeat until the distance between two closest samples is larger than predefined maximum.