Machine Learning Must-Know
Machine Learning is the ability of a computer to learn to perform a human-level task without being explicitly programmed. — Arthur Samuel(1959)
Data Mining is the act of digging into large datasets to discover inherent patterns.
ML Solves The following problem well;
- Problems requiring long rules(if-else statements) to solve.
- Complex problems without traditional solutions.
- Problems requiring quick and continuous adaptation.
- Problems requiring getting insights from large datasets.
Types of ML Systems
Machine Learning systems are grouped by the following criteria;
- Supervision or not
- supervised
- unsupervised
- Reinforcement learning
- Incremental or on the fly
- online
- batch learning
- Comparing new datasets or detecting patterns by building a model
- instance-based
- model-based learning
Supervised Learning
Supervised learning involves feeding the algorithm with both the data and the desired outcome(labels).
Types of Supervised Learning algorithms
Classification
Classification results in an outcome that represent the probability that value belongs to a given class(for example 80% chance of a picture is a cat or a dog).
Classification Algorithms
- K-nearest Neighbors Classification
- Logistic Regression
- Support Vector Machines
- Naive Bayes Classification
- Decision Tree Classification
- Random Forest Classification (Ensemble methods)
Regression
Regression-based models are trained on data inputs that return outcomes with continuous numeric values. Examples include the prediction of house and stock prices.
Regression Algorithms
- K-nearest Neighbors Regression
- Linear Regression (simple and Multiple Linear Regression)
- Polynomial Regression
- Support Vector Regression
- Naive Bayes Regression
- Decision Tree Regression
- Random Forest Regression (Emsemble methods)
Unsupervised Learning
Unsupervised Learning involves finding inherent insight in data without labels rather than predicting values from a known outcome. In short, unsupervised learning does not involve labels.
Unsupervised Learning Algorithms
- Clustering
- k-Means
- Hierarchical Cluster Analysis (HCA)
- Expectation Maximization
2. Dimensionality reduction
- Principal Component Analysis (PCA)
- Kernel PCA
- Locally-Linear Embedding (LLE)
- t-distributed Stochastic Neighbor Embedding (t-SNE)
3. Anomaly detection
- Association rule-mining
- Apriori
- Eclat
Challenges of ML
- Insufficient quantity of training data
- Non-representative training data
- Sample noise(if the sample is too small) and sample bias(if the sample is too large)
- Overfitting( the model generalize well on training data but perform poorly on unseen data)
- Irrelevant features
- Poor quality data
- Feature Engineering (difficulty in feature engineering)
References
https://www.amazon.com/Practical-Machine-Learning-Python-Problem-Solvers/dp/1484232062
https://machinelearningmastery.com/introduction-to-tensors-for-machine-learning/
https://en.wikipedia.org/wiki/Marginal_distribution
https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow-ebook/dp/B06XNKV5TS