Member-only story
Andrew Ng’s Machine Learning Course in Python (Anomaly Detection)
This is the last part of Andrew Ng’s Machine Learning Course python implementation and I am very excited to finally complete the series. To give you guys some perspective, it took me a month to convert these codes to python and writes an article for each assignment. If any of you were hesitating to do your own implementation, be it in Python, R or Java, I strongly recommend you to go for it. Coding these algorithms from scratch not only reinforce the concepts taught, you will also get to practice your data science programming skills in the language you are comfortable with.
With that said, let’s dive into the last programming assignment
In this part of the assignment, we will implement an anomaly detection algorithm using the Gaussian model to detect anomalous behavior in a 2D dataset first and then a high-dimensional dataset.
Loading relevant libraries and the dataset
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import loadmatmat = loadmat("ex8data1.mat")
X = mat["X"]
Xval = mat["Xval"]
yval = mat["yval"]
Visualizing the data
plt.scatter(X[:,0],X[:,1],marker="x")
plt.xlim(0,30)
plt.ylim(0,30)
plt.xlabel("Latency (ms)")
plt.ylabel("Throughput (mb/s)")
To estimate parameters (mean and variance) for the Gaussian model
def estimateGaussian(X):
"""
This function estimates the parameters of a Gaussian distribution using the data in X
"""
m = X.shape[0]
#compute mean
sum_ = np.sum(X,axis=0)
mu = 1/m *sum_
# compute variance
var = 1/m * np.sum((X - mu)**2,axis=0)
return mu,varmu, sigma2 = estimateGaussian(X)
Multivariate Gaussian Distribution is an optional lecture in the course and the code to compute the probability density is given to us. However, in order for me to proceed on with the assignment, I need to write the multivariateGaussian
function from scratch.
def multivariateGaussian(X, mu…