Let’s talk about our algorithm (and data)

Chang Xiao
May 17 · 3 min read

Siggy uses a “content-based” recommendation algorithm. This means that we only need “content data” for the algorithm to work.

In the context of a product recommender, we only need product attributes such as (Product name, description, images, tags) to generate effective recommendations for the user.

There are three primary reasons we choose a “content-based” recommendation algorithm first:

  • No dependency on user-generated data.
    Siggy does not rely on user-generated data such as ratings to establish relationships between the products, unlike classic collaborative filtering algorithms.
  • Availability of data
    Product information is public and accessible for most e-commerce stores, it is the essential data every store has.
  • Data privacy
    Other popular AI-based recommendation algorithms such as items bought together need access to order and/or customer data.

How our algorithm performs

Generally, our algorithm performed well for shops that have a large number of products (e.g. 500+), consistent product images, and that are non-gender/age-specific.

A good example

One of the examples that work well is jewelry, specifically “Diamond” in this case. Our recommendation works well since:

  • Product images will have a white/transparent background for the algorithm to find products based on image similarities (Shape, color, patterns, etc).
  • Product title and description contain multiple mentions of the keyword “Diamond”.
(Jewelry example)

A Tough example

In this case, our recommendations are all over the place. Three of the four products are things for the opposite gender (men) and one for boys.

This example is especially tough because “Shorts” is something that can be worn by both genders, however, it is clear from the image this is a specific style targeted towards women.

Our algorithm failed to capture and weigh some more important product attributes such as the style “cut-off” and the length of the shorts. In the end, it was looking for “Shorts” regardless of the intended audience.

(Non-gender-specific example)

A better example

Dresses are better examples because they have a unique pattern/structure and consistent product image modeled in similar ways.

It also helps that there are no “dresses” for men or children in the product catalog.

(Gender-specific example)

Early observations

While it is too early to tweak our algorithm and while we are gathering data on its efficacy, we can notice that the algorithm works well when:

  • There’s a large product catalog.
  • There are consistent product images with minimal background and noise.
  • When the items have unique shapes, colors, and/or patterns.
  • The item has a wide intended audience (men, women, children, etc).

Siggy Recommender

A journey into ML/AI with digital products.

Siggy Recommender

Chronicle the journey of learning and building a digital product with AI/ML

Chang Xiao

Written by

Meh

Siggy Recommender

Chronicle the journey of learning and building a digital product with AI/ML