Dive into Unsupervised Machine learning

Published in

Analytics Vidhya

4 min readJun 16, 2020

Unsupervised learning is the training of an algorithm using information that is neither classified nor labeled and allowing the algorithm to act on that information without guidance.The main idea behind unsupervised learning is to expose the machines to large volumes of varied data and allow it to learn and infer from the data. However, the machines must first be programmed to learn from data. Unsupervised learning problems can be further grouped into clustering and association problems.

Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behaviour.

Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.

Application of Unsupervised Machine Learning

1. Human Behaviour Analysis

2. Social Network Analysis to define groups of friends.

3. Market Segmentation of companies by location, industry, vertical.

4. Organizing computing clusters based on similar event patterns and processes

Algorithm for Unsupervised Machine Learning:

K-means Algorithm
2. Apriori Algorithm
3. Expectation–maximization algorithm (EM)
4. Principal Component Analysis (PCA)

K-Means Algorithm

K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data and the goal of this algorithm is to find groups in the data

Steps to use this algorithm:-.

1-Clusters the data into k groups where k is predefined.

2-Select k points at random as cluster centers.

3-Assign objects to their closest cluster center according to the Euclidean distance function.

4-Calculate the centroid or mean of all objects in each cluster.

Example: Behavioral segmentation like segment by purchase history or by activities on application, website, or platform Separate valid activity groups from bots

Code for K-Mean Algorithm

model <- kmeans(X, 3) 
predicted= model.predict(x_test)

Apriori Algorithm

It is a categorisation algorithm attempts to operate on database records, particularly transactional records, or records including certain numbers of fields or items.It is mainly used for sorting large amounts of data. Sorting data often occurs because of association rules.

Example: To analyse data for frequent if/then patterns and using the criteria support and confidence to identify the most important relationships.

Code for Apriori Algorithm

data <- as(data, "transactions") 
tl <- as(data, "tidLists") 
rules <- apriori(data,parameter = list(supp =  = 0.001, conf = 0.80))

Expectation Maximization (EM)

It’s an algorithm for maximum likelihood estimation when your data is incomplete, has missing data points.More complex EM algorithm can find model parameters even if you have missing data. It works by choosing random values for the missing data points, and using those guesses to estimate a second set of data. The new values are used to create a better guess for the first set, and the process continues until the algorithm converges on a fixed point but it has a limitation that it is very slow, even on the fastest computer.

Example: To find the expectation maximisation in coin toss

Code for Expectation Maximisation Algorithm

x <- data pi1<-0.5 
pi2<-0.5 mu1<--0.01 
mu2<-0.01 
sigma1<-sqrt(0.01) 
sigma2<-sqrt(0.02) 
loglik<- rep(NA, 1000) 
loglik[1]<-0loglik[2]<-mysum(pi1*(log(pi1)+log(dnorm(dat,mu1,sigma1))))+mysum(pi2*(log(pi2)+log(dnorm(dat,mu2,sigma2)))) mysum <- function(x) {  sum(x[is.finite(x)])} logdnorm <- function(x, mu, sigma) {  mysum(sapply(x, function(x) {logdmvnorm(x, mu, sigma)}))  } while(abs(loglik[k]-loglik[k-1]) >= 0.00001) 
{ 
tau1<-pi1*dnorm(dat,mean=mu1,sd=sigma1)/(pi1*dnorm(x,mean=mu1,sd=sigma1)+pi2*dnorm(dat,mean=mu2,sd=sigma2))  
tau2<-pi2*dnorm(dat,mean=mu2,sd=sigma2)/(pi1*dnorm(x,mean=mu1,sd=sigma1)+pi2*dnorm(dat,mean=mu2,sd=sigma2)) 
tau1[is.na(tau1)] <- 0.5  
tau2[is.na(tau2)] <- 0.5 
pi1<-mysum(tau1)/length(dat) 
pi2<-mysum(tau2)/length(dat) 
mu1<-mysum(tau1*x)/mysum(tau1) 
mu2<-mysum(tau2*x)/mysum(tau2)  
sigma1<-mysum(tau1*(x-mu1)^2)/mysum(tau1) 
sigma2<-mysum(tau2*(x-mu2)^2)/mysum(tau2) 
loglik[k+1]<-mysum(tau1*(log(pi1)+logdnorm(x,mu1,sigma1)))+mysum(tau2*(log(pi2)+logdnorm(x,mu2,sigma2)))  
k<-k+1
} 
gm<-normalmixEM(x,k=2,lambda=c(0.5,0.5),mu=c(-0.01,0.01),sigma=c(0.01,0.02))

Principal Component Analysis (PCA)

It’s an important method for dimension reduction.It extracts low dimensional set of features from a high dimensional data set with a motive to capture as much information as possible and to visualise high-dimensional data, it also reduces noise and finally makes other algorithms to work better because we are injecting fewer inputs.

Example: When we have to bring out strong patterns in a data set or to make data easy to explore and visualize

Code for Principal Component Analysis (PCA)

train.data <- data.frame(Item = train$Item, prin_comp$x) 
train.data <- train.data[,1:31] 
rpart.model <- rpart(Item_Outlet_Sales ~ .,data = train.data, method = “anova”) 
test.data <- predict(prin_comp, newdata = pca.test) 
test.data <- as.data.frame(test.data) 
test.data <- test.data[,1:30] 
rpart.prediction <- predict(rpart.model, test.data)