Machine Learning Must-Know

Published in

Analytics Vidhya

2 min readOct 7, 2020

Machine Learning is the ability of a computer to learn to perform a human-level task without being explicitly programmed. — Arthur Samuel(1959)

Data Mining is the act of digging into large datasets to discover inherent patterns.

ML Solves The following problem well;

Problems requiring long rules(if-else statements) to solve.
Complex problems without traditional solutions.
Problems requiring quick and continuous adaptation.
Problems requiring getting insights from large datasets.

Types of ML Systems

Machine Learning systems are grouped by the following criteria;

Supervision or not

supervised
unsupervised
Reinforcement learning

Incremental or on the fly

online
batch learning

Comparing new datasets or detecting patterns by building a model

instance-based
model-based learning

Supervised Learning

Supervised learning involves feeding the algorithm with both the data and the desired outcome(labels).

Types of Supervised Learning algorithms

Classification

Classification results in an outcome that represent the probability that value belongs to a given class(for example 80% chance of a picture is a cat or a dog).

Classification Algorithms

K-nearest Neighbors Classification
Logistic Regression
Support Vector Machines
Naive Bayes Classification
Decision Tree Classification
Random Forest Classification (Ensemble methods)

Regression

Regression-based models are trained on data inputs that return outcomes with continuous numeric values. Examples include the prediction of house and stock prices.

Regression Algorithms

K-nearest Neighbors Regression
Linear Regression (simple and Multiple Linear Regression)
Polynomial Regression
Support Vector Regression
Naive Bayes Regression
Decision Tree Regression
Random Forest Regression (Emsemble methods)

Unsupervised Learning

Unsupervised Learning involves finding inherent insight in data without labels rather than predicting values from a known outcome. In short, unsupervised learning does not involve labels.

Unsupervised Learning Algorithms

Clustering

k-Means
Hierarchical Cluster Analysis (HCA)
Expectation Maximization

2. Dimensionality reduction

Principal Component Analysis (PCA)
Kernel PCA
Locally-Linear Embedding (LLE)
t-distributed Stochastic Neighbor Embedding (t-SNE)

3. Anomaly detection

Association rule-mining
Apriori
Eclat

Challenges of ML

Insufficient quantity of training data
Non-representative training data
Sample noise(if the sample is too small) and sample bias(if the sample is too large)
Overfitting( the model generalize well on training data but perform poorly on unseen data)
Irrelevant features
Poor quality data
Feature Engineering (difficulty in feature engineering)