Learning from Observation

Hengky Sanjaya
Hengky Sanjaya Blog
3 min readMar 24, 2020

Week#6-Intelligent System

In this blog, we will discuss one of the techniques in Machine Learning where the machine learns from the observation of the data.

If you want to know more about the types of machine learning, you can visit this post:
https://medium.com/hengky-sanjaya-blog/supervised-vs-unsupervised-learning-aae0eb8c4878

Clustering Technique

The clustering technique applies when there is no class to be predicted but rather when the instances are to be divided into natural groups.

This technique is basically a type of unsupervised learning method.

Examples:

  • Marketing: Characterize & discover customer segments for marketing purposes.
  • Biology: Classification among different species of plants and animals.
  • Libraries: Clustering different books on the basis of topics and information.
  • City Planning: Make groups of houses and to study their values based on their geographical locations and other factors present.

In the clustering technique, we ideally use the semantic similarity to find and to extract the related things from the data.

We do the distance calculations to measure the distance of similarity.

Distance measurement will determine how the similarity of two elements is calculated and it will influence the shape of the clusters.

  • The Euclidean distance:
    The source of this formula is in the Pythagorean theorem.
  • The Manhattan distance:
    Computes the distance that would be traveled to get from one data point to the other if a grid-like path is followed.

In the clustering technique, there are 2 algorithms:

  • Partitional algorithms:
    Construct various partitions and then evaluate them by some criterion.
    - Usually, start with a random (partial) partitioning
    - Refine it iteratively (K means clustering, Model-based clustering)
  • Hierarchical algorithms:
    Create a hierarchical decomposition of the set of objects using some criterion.

K-Means Algorithm

K = number of clusters and centroid to generate.

So in this algorithm, we will choose K data to be the centroid. And keep reassigning the centroid by using the average of the same group data point.

This process will stop when:

  • The centroids have stabilized — there is no change in their values because of the clustering has been successful.
  • Points remain in the same cluster.
  • The defined number of iterations has been achieved.

To simulate the K-Means Clustering process, you can visit this link to see the demo
http://user.ceng.metu.edu.tr/~akifakkus/courses/ceng574/k-means/

Thank you…

--

--