Once Upon a Pixel: What’s Image Clustering? (using orange software)

Priya Shahari
4 min readJan 29, 2024

--

Hey photo pals! 📸 Ready to dive into the ultimate picture party where pixels groove, patterns shine, and chaos transforms into organized bliss? Welcome to the magical realm of Image Clustering — your personal photo wizard! 🪄✨

Data does not always come in a nice tabular form. It can also be a collection of text, audio recordings, video materials or even images. However, computers can only work with numbers, so for any data mining, we need to transform such unstructured data into a vector representation.

For retrieving numbers from unstructured data, Orange can use deep network embedders. We have just started to include various embedders in Orange, and for now, they are available for text and images.

Orange Software

Imagine your pictures having a secret language — a way to find their look-alikes and form cool groups. That’s what Image Clustering does! It’s like a magical buddy that gathers similar-looking pictures and puts them in special groups, making your photo collection neat and tidy. No more wandering through a sea of photos to find your faves!

Here, let’s take an example of image embedding and show how easy is to use it in Orange. Technically, Orange would send the image to the server, where the server would push an image through a pre-trained deep neural network

Here we have 13 images animals. First, download the images and unzip them. Then use Import Images widget from Orange’s Image Analytics add-on and open the directory containing the images.

We can visualize images in Image Viewer widget. Here is our workflow so far, with images shown in Image Viewer:

Workflow
Image viewer

But what do we see in a data table?

Data table

Some description of images (file name, the location of the file, its size, and the image width and height).

This cannot help us with machine learning. As I said before, we need numbers. To acquire numerical representation of these images, we will send the images to Image Embedding widget.

Now we have the numbers we wanted.From now on, we can apply all the standard machine learning techniques, say, clustering.

Let us measure the distance between these images and see which are the most similar. We used Distances widget to measure the distance. Normally, cosine distance works best for images, but you can experiment on your own. Then we passed the distance matrix to Hierarchical Clustering to visualize similar pairs

Workflow

This looks very promising! All the right animals are grouped together.

Hierarchical clustering

Think of hierarchical clustering like arranging your wardrobe. You have a mix of clothes, and you want to organize them based on their similarities. Starting with each piece as its own group, you begin pairing similar items together — t-shirts with t-shirts, jeans with jeans. As you continue, you create clusters within clusters, eventually forming a neat hierarchy of outfits based on their resemblance. Hierarchical clustering is like putting order in your wardrobe, making it easy to find the perfect outfit for any occasion! 👕👖👗

Why Image Sorting is Awesome?

  1. Tidy and Clean Photo Albums: Give your photo albums a new look! No more disorganized photo stacks. Image clustering is like a magic cleaner for your photos — it makes everything incredibly neat and organized.
  2. Identifying Cool Patterns: Have you ever worn glasses that reveal patterns that are hidden? With Image Clustering, you can see amazing details in your photos that you were previously unaware of. Comparable to a treasure hunt within your own images!
  3. Simple Investigation: Photograph exploration becomes as simple as ABC. You may admire your images without getting lost in a photo jungle because everything is well-organized. Take a leisurely stroll through a stunning garden filled with memories!

Wanna read more about image analytics? Here you go:

Human or Bot? Google ReCAPTCHA’s Complex Image Classification and Clustering Dance

Leveraging Orange Software for Image Analysis in Healthcare and Traffic Management

--

--