Unsupervised Learning

azam sayeed

Published in

Analytics Vidhya

5 min readDec 23, 2019

Clustering

Is the process of dividing the dataset into groups consisting of similar data points. data in the same group are as similar as possible and dissimilar to other groups
used in Recommendation Engine.

Types:

Exclusive Clustering- Each data point can lie in only one cluster. Ex: K-Means clustering
Overlapping Clustering- Allows data objects to be grouped in 2 or more clusters. Ex: Fuzzy/C-Means Clustering

3. Hierarchical clustering-

Grouping according to Hierarchical clustering

K-Means Clustering

grouping of similar elements into a cluster
Applications — Behavior segmentation, detecting bots.

step1: Select the number of clusters to be identified. suppose K=3

step2: Randomly select 3 distinct data point

step3: Measure the distance between the 1st point and selected 3 clusters

step4: Assign the 1st point to the nearest cluster

step5: calculate the mean value including the new point for the first cluster

step 6: Repeat step 3–5 for remaining clusters

These Iterations are done multiple times by the model for variations of step2 till we get total variation is minimum

step 3, randomly select points of clusters, do step 4 and 5 , calculate variance and continue iterations

Since we have better variance in this iteration of selected cluster points. Iteration can be stopped

Kmeans clustering using scipy module given in my previous post:

Module 3 — Numpy

Introduction to Numpy Array

medium.com

To find the value of optimal K, we use elbow plot variation vs k numbers

Association Rule Mining

is a rule-based machine learning method for discovering interesting relations between entities

Antecedent(if), Consequent(then) — patterns such as what is the chance if milk is purchased then buyer likely to purchase bread

Application- Market basket Analysis

is used to identify associations between items purchased by customers

ways to measure the association

x: antecedent, y: Consequent , x=>y , x=[list of items]

Support — Number of the transaction containing x,y upon the total number of transactions
Confidence- Number of the transaction having x,y divided by the number of transaction of x
Lift — check the formula

lift (x=>y) >1, y likely to be bought if x is bought

lift(x=>y) <1 , y unlikely to be bought if x is bought

Demo for Kmeans and Association Rule:

Notebook on nbviewer

Check out this Jupyter notebook!

nbviewer.jupyter.org

Recommendation Engine

filtering system that predicts and shows the items of user interest. Used in the digital domain like amazon,Flipkart. Can significantly boost revenue, CTRs, user experience, conversions, and other imp metrics
similar to the salesperson, trained for up-selling and cross-selling. Shows various products based on interest (build on browsing history data)
cold stat problem

Types:

Collaborative Filtering: collect and analyzing a large amounts of user data like behavior, likes, activity, etc and predicting what product the user will like based on other similar user’s interests. [similar customers are grouped in same cluster]

1a. User-based Collaborative filtering

If two users had similar tastes in the past then they are likely to have a similar liking in the future.

ex: if user A has bought polo shirts like user B, and user A has additionally bought polo jacket then it could be recommended to user B based on past behavior.

Measure of similarity

Cosine Similarity

1b.Item-based Collaborative filtering- cal the item similarity based on item preferences and find top similar items to the user.

2. Content-based Filtering: based on product characteristics, keywords refer products

3. Hybrid Recommendation System: 1+2 types combination ex: Netflix [clusters based on items]

Dimensionality Reduction

The process of reducing set of vast amounts of dimensions.

Types of Dimensionality Reduction

Feature Elimination: Removing some dimensions completely when it is redundant to other variables or not contributing to information

adv: Reduces vast dataset to a smaller chunk

disadvantage: we might lose some valuable data

2. Feature Extraction: Extracting new variables from old variables

PCA works on Feature Extraction

Applications: Image processing

Principle Component Analysis

Reduces the number of random variables of a given dataset, by identifying the low dimension set of axes

Ex: classifying vehicle as Car vs Bus; Number of wheels is at standard 6 for buses and cars is absolutely 4 always hence has less variance whereas the height of the vehicle can vary independently.

A has a lower variable compared to line B due to lower data spread from the line . So B can be the direction of the principal component. Eigenvector and Eigenvalues are depicted above

Eigenvector with the highest Eigenvalue is the Principal component Line(data is most spread out from eigen vector)

Note: Eigenvectors are orthogonal to each other

Example:

suppose we have 3 variables Hours on mobile,Hours on internet and Age

we draw 3 eigen vectors orthogonal to each other on the datpoints on xy plane. Since datpoints dont lie on z plane we can consider ev3 as zero remove it (concept of Principal component analysis)

3D is reduced to 2D using principal component analysis

Demo for SVM, run below code in jupyter notebook

from sklearn import datasets
cancer =datasets.load_breast_cancer()
print("Features: ",cancer.feature_names)
print("Labels:",cancer.target_names)
print(cancer.data[0:5])
print(cancer.target)from sklearn.model_selection import train_test_splitX_train,X_test,Y_train,Y_test =train_test_split(cancer.data,cancer.target,test_size=0.3,random_state=109)
from sklearn import svmclf=svm.SVC(kernel='linear')
clf.fit(X_train,Y_train)y_pred =clf.predict(X_test)from sklearn import metricsmetrics.confusion_matrix(Y_test,y_pred)
metrics.accuracy_score(Y_test,y_pred)