Let’s talk about our algorithm (and data)
Siggy uses a “content-based” recommendation algorithm. This means that we only need “content data” for the algorithm to work.
In the context of a product recommender, we only need product attributes such as (Product name, description, images, tags) to generate effective recommendations for the user.
There are three primary reasons we choose a “content-based” recommendation algorithm first:
- No dependency on user-generated data.
Siggy does not rely on user-generated data such as ratings to establish relationships between the products, unlike classic collaborative filtering algorithms. - Availability of data
Product information is public and accessible for most e-commerce stores, it is the essential data every store has. - Data privacy
Other popular AI-based recommendation algorithms such as items bought together need access to order and/or customer data.
How our algorithm performs
Generally, our algorithm performed well for shops that have a large number of products (e.g. 500+), consistent product images, and that are non-gender/age-specific.
A good example
One of the examples that work well is jewelry, specifically “Diamond” in this case. Our recommendation works well since:
- Product images will have a white/transparent background for the algorithm to find products based on image similarities (Shape, color, patterns, etc).
- Product title and description contain multiple mentions of the keyword “Diamond”.
A Tough example
In this case, our recommendations are all over the place. Three of the four products are things for the opposite gender (men) and one for boys.
This example is especially tough because “Shorts” is something that can be worn by both genders, however, it is clear from the image this is a specific style targeted towards women.
Our algorithm failed to capture and weigh some more important product attributes such as the style “cut-off” and the length of the shorts. In the end, it was looking for “Shorts” regardless of the intended audience.
A better example
Dresses are better examples because they have a unique pattern/structure and consistent product image modeled in similar ways.
It also helps that there are no “dresses” for men or children in the product catalog.
Early observations
While it is too early to tweak our algorithm and while we are gathering data on its efficacy, we can notice that the algorithm works well when:
- There’s a large product catalog.
- There are consistent product images with minimal background and noise.
- When the items have unique shapes, colors, and/or patterns.
- The item has a wide intended audience (men, women, children, etc).