Any machine learning algorithm is a hypothesis set which is taken before considering the training data and which is used for finding the optimal model. Machine learning algorithms have 3 broad categories -
- Supervised learning — the input features and the output labels are defined.
- Unsupervised learning — the dataset is unlabeled and the goal is to discover hidden relationships.
- Reinforcement learning — some form of feedback loop is present and there is a need to optimize some parameter.
In this post, we will have a high-level description of some of the common and popular machine learning algorithms and have an elevated view of them. I will take up a more in-depth analysis of these algorithms in the future posts. Please note that this post builds up on my earlier post on common machine learning terms, so please take a look at that post before reading this.
Ordinary Least Squares Linear Regression
- With linear regression, the objective is to fit a line through the distribution which is nearest to most of the points in the training set.
- In simple linear regression, the regression line minimizes the sum of distances from the individual points, that is, the sum of the “Square of Residuals”. Hence, this method is also called the “Ordinary Least Square”.
- Linear regression can also be achieved in case of multidimensional data i.e. data-sets that have multiple features. In this case, the ‘line’ is just a higher dimensional plane with dimensions ‘N-1’, N being the dimension of the dataset.
- Logistic Regression although termed regression is a classification technique.
- Contrary to linear regression, logistic regression does not assume a linear relationship between the dependent and independent variables. Although a linear dependence on the logit of the independent variables is assumed.
- In other words, the decision surface is linear.
Support Vector Machines
- Support vector machine (SVM) is a supervised machine learning algorithm that can be used for both classification and regression challenges.
- In SVM, we plot the data points in an N-dimensional space where N is the number of features and find a hyper-plane to differentiate the datapoints.
- This is a good algorithm when the number of dimensions is high with respect to the number of data points.
- Due to dealing with high dimensional spaces, this algorithm is computationally expensive.
- Attempts to split data into K groups that are closest to K centroids.
- This can be thought of as creating stereotypes among groups of people.
The algorithm to implement K means clustering is quite simple.
- You randomly pick K centroids
- Assign each datapoint to the centroid closest to it.
- Recompute the centroids based on the average position of each centroid’s points
- Iterate till points stop changing assignments to centroids.
To predict you just find the centroid they are closest to.
- Decision tree is a classifier in the form of a tree structure.
- Decision trees classify instances or examples by starting at the root of the tree and moving through it until a leaf node which is the target value.
- Generating decision trees are useful as they mimic human understanding and thus, the models are easy to understand.
- Small trees are better as the larger the trees, the less the accuracy.
These are some key machine algorithms that I thought are important and should be looked into for someone who is a machine learning beginner. Machine learning algorithms are like forks, knives, saws, etc. They have various advantages and disadvantages and are applicable in different scenarios.
If this post has stimulated you, then I would highly encourage you to go ahead and get a deeper understanding of these algorithms. Also, take a look at this awesome post in SO for how one would inspect the drawbacks and assumptions of any statistical method. In case you have encountered some common algorithms which you think are important too and not included here, do write about them in the comments below.
Thanks for reading. If you are interested in talking more on this, just drop me a message @alt227Joydeep. I would be glad to discuss this further. Also please hit the claps and help this article reach more audience.