Fuzzy Clusters

Malathi Murugesan
Analytics Vidhya
Published in
5 min readOct 12, 2017

Given a set of data points, traditional clustering techniques partition the data into several groups such that the degree of association is strong within one group and weak between data in different groups. Similarity is high among the points in the intra clusters and low among points in the inter clusters. Classical clustering techniques result in partitions where each data point can belong to only one cluster. Fuzzy clustering by contrast allows data points to belong to more than one group. The resulting partition is therefore a fuzzy partition. Each cluster is associated with a membership function that expresses the degree to which individual data points belong to the cluster.

Fuzzy Clustering

Fuzzy clustering is also known as soft clustering, because it allows an object to belong to more than one cluster. Consider the following scenario:

An online electronics store called MMcart, records customer’s browsing behavior in a log. Here the data mining task is to use the log data of customers to classify them based on their search intent. In the entire span of the time he spent, in that online store, he must have browsed information about a particular product, or might have searched for customer service information. It is difficult to know the customer’s search intent in advance. As this problem sounds like an unsupervised learning, a clustering analysis helps. Here a cluster is one that contains similar user browsing activities.

Let each session be the time spent by the customer in browsing. Sometimes, it may so happen that not every session, belongs to only one cluster. For example, suppose user sessions involving the purchase of mobile phones form one cluster, and user sessions that compare the price of laptop computers form another cluster. What if a user, in one session, makes an order for a phone, and simultaneously compares several laptop computers? Such a session belongs to both clusters .These type of clusters are called fuzzy clusters. So Fuzzy clusters provide the flexibility of allowing an object to participate in multiple clusters.

Fuzzy Set

Given a set of objects, X= {x1,x2,…xn} a fuzzy set S is a subset of X that allows each object in X to have a membership degree between 0 and 1. In general, a fuzzy set, S, can be defined as a function, Fs: X -> [0,1]

The particular brand of a mobile phone is more popular if more units are sold. The degree of popularity of a mobile phone, O, is measured by the number of sales of it. To compute the degree of popularity of a phone, the following formula is used.

Function Popularity (O) defines a fuzzy set of popular mobile phones. For example, the sales of mobile phones at MMCart are as shown in the table. The fuzzy set of popular mobile phones is { A (.07), B(1), C(.89), D(.36) } where the degrees of membership are written in parentheses.

Suppose the MMCart online store has six reviews. The keywords contained in these reviews are listed in Table. We can group the reviews into two fuzzy clusters, C1 and C2. C1 is for “Mobile Phone” and “selfie stick ,” and C2 is for “computer.”

The partition matrix is shown in the figure below:

The Sum of Squared Error (SSE) can be used to measure how well a fuzzy clustering fits a data set. Fuzzy clustering is also called soft clustering because it allows an object to belong to more than one cluster.

The k-means clustering can be considered as a special case of fuzzy clustering.

Implementation in R Programming Language

Let us take iris data set which has 3 classes.

library(cluster)data(iris)iris.x <- iris[, 1:4]cl3 <- pam(iris.x, 3)$clusteringop <- par(mfrow= c(2,2))clusplot(iris.x, cl3, color = TRUE)

The clusters formed for the three classes are shown in the plot.

We call fanny object in R to find out the membership co-efficient of each data point to a particular cluster.

> fanny_iris<-fanny(iris.x,3)> fanny_iris

The membership coefficients of the first six data points for the three clusters are shown.

> library(factoextra)> fviz_cluster(fanny_iris, ellipse.type = "norm", repel = TRUE,+              palette = "jco", ggtheme = theme_minimal(),+              legend = "right")
> fviz_silhouette(fanny_iris, palette = "jco",+                                    ggtheme = theme_minimal())

The clusters and their size are given above.

For some applications like medicine and in the study of gene patterns in Bio- science, fuzzy clustering is more appropriate than hard clustering.

Thanks for reading. You can connect with me in linkedin

--

--