TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Andrew Ng’s Machine Learning Course in Python (Anomaly Detection)

Benjamin Lau
TDS Archive
Published in
8 min readJan 12, 2019

--

Machine Learning — Andrew Ng

This is the last part of Andrew Ng’s Machine Learning Course python implementation and I am very excited to finally complete the series. To give you guys some perspective, it took me a month to convert these codes to python and writes an article for each assignment. If any of you were hesitating to do your own implementation, be it in Python, R or Java, I strongly recommend you to go for it. Coding these algorithms from scratch not only reinforce the concepts taught, you will also get to practice your data science programming skills in the language you are comfortable with.

With that said, let’s dive into the last programming assignment

In this part of the assignment, we will implement an anomaly detection algorithm using the Gaussian model to detect anomalous behavior in a 2D dataset first and then a high-dimensional dataset.

Loading relevant libraries and the dataset

import numpy as np
import matplotlib.pyplot as plt
from scipy.io import loadmat
mat = loadmat("ex8data1.mat")
X = mat["X"]
Xval = mat["Xval"]
yval = mat["yval"]

Visualizing the data

plt.scatter(X[:,0],X[:,1],marker="x")
plt.xlim(0,30)
plt.ylim(0,30)
plt.xlabel("Latency (ms)")
plt.ylabel("Throughput (mb/s)")

To estimate parameters (mean and variance) for the Gaussian model

def estimateGaussian(X):
"""
This function estimates the parameters of a Gaussian distribution using the data in X
"""

m = X.shape[0]

#compute mean
sum_ = np.sum(X,axis=0)
mu = 1/m *sum_

# compute variance
var = 1/m * np.sum((X - mu)**2,axis=0)

return mu,var
mu, sigma2 = estimateGaussian(X)

Multivariate Gaussian Distribution is an optional lecture in the course and the code to compute the probability density is given to us. However, in order for me to proceed on with the assignment, I need to write the multivariateGaussian function from scratch.

def multivariateGaussian(X, mu…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Benjamin Lau
Benjamin Lau

Written by Benjamin Lau

Self-motivated data scientist. My proactive approach has allowed me to embrace and stay at the forefront of the ever-evolving tech landscape.

Responses (8)