The road to Machine Learning Engineer

Evelina Vrabie
Jumpstart
Published in
2 min readApr 29, 2019
Replace TensorFlow with scikit-learn…🤣

I have successfully completed the first part of Machine Learning Engineer Nanodegree with Udacity. The first part is Machine Learning Foundation and deals with Supervised and Unsupervised Learning.

This is the TL;DR version of what I have learnt so far.

What is Machine Learning: an introductory chapter with some examples of ML in practice.

Introductory Practice Project: Titanic Survival Exploration — discovering what passengers were more likely to have survived the tragedy.

Intro to NumPy and Pandas

Training and Testing Models: tunning parameters manually and automatically

Evaluation Metrics: the Confusion Matrix, Accuracy, Precision, Recall, F1-score, F-beta score, regression metrics

Model Selection: types of errors, Model Complexity Graph, Cross Validation, K-Fold Cross Validation, Learning Curves, Overfitting and Underfitting, Grid Search

Project 1: Predicting Boston House Prices

Linear Regression: Absolute and Square Trick, Gradient Descent, Mean Absolute and Squared Errors, minimising error functions, mini-batch gradient descent, multiple linear regression, polynomial regression, L1 and L2 regularization

The Perceptron Algorithm: classification problems, Perceptrons and logical operations, the Perceptron algorithm

Decision Trees: recommender apps, Entropy, Multiclass Entropy, Random Forests, Hyperparameters

Naive Bayes: a really cool explanation of the Bayes Theorem, Bayesian Learning, building a spam classifier

Support Vector Machines: margin error calculations, error functions, the C parameters, polynomial and RBF kernels

Ensemble Methods: bagging, boosting, AdaBoost, Gradient Boosting

Project 2: Finding donors for a fictitious charity called…CharityML

Clustering: K-means, movie recommendation system mini-project

Hierarchical and Density-Based Clusters: single-link, average-link, complete-link, Ward, HC applications, DBSCAN and applications

Gaussian Mixture Models and Clustering Validation: GMM in one dimension, Gaussian Distribution in 2D, Expectation Maximisation, cluster analysis process, external validation indices, Adjusted Rand Index, Silhouette Coefficient

Feature Scaling: min/max rescaler

Principal Component Analysis: data dimensionality, measurable vs latent features, composite features, maximal variance, information loss and Principal Components, PCA for feature transformation, PCA for facial recognition

Random Projection and ICA: Independent Component Analysis, retrieving original signals from audio tracks, applications in EEG and financial (stocks analysis)

Project 3: Cluster Customer Segments to discover the profile of retail customers based on their annual spending

So far, I am very happy with both the content and the quality of the feedback. Maybe I’ll write a more detail blog post about that at some point. But for now…onwards to Term 2 🎉, Advanced Machine Learning, with some cool stuff like Convolutional Neural Nets and a capstone project.

--

--

Evelina Vrabie
Jumpstart

Technical founder excited to develop products that improve peoples’ lives. My best trait is curiosity. I can sky-dive and be afraid of heights at the same time.