Inside AI

Product matching via Machine Learning — Yes, we did it!

Price2Spy’s guide through Product matching assisted by Machine Learning

Misha Krunic
GoBeyond.AI: E-commerce Magazine

--

Price2Spy will soon be launching something no other price monitoring tool in the world offers — Product matching assisted by Machine Learning (ML).

In order no to miss anything, we welcome you to visit our blog in order to find all information in more detail:

We are very proud of this project — it took us 18 months of hard work, with a lot of tumbling in the dark. 18 months is a lot for a commercial project, it’s not often that software companies the size of Price2Spy go for such an investment. We did, and we are very happy that we can finally present the results.

Trending GoBeyond.ai articles:

1. Case Study: The Technology, Strategy & Growth Process Behind Aussie Retailer Princess Polly

2. Role Of E-COMMERCE Applications In Business Growth

3. What is cohort analysis? The beginner’s guide

4. Brexit and eCommerce

These days you will read a lot about various ML projects. Please be aware that ML can be roughly divided into:

  • Numerical problems (for example: try to predict oil price based on number of available supply & demand factors — all numerical)
  • Text processing (for example: try to identify a degree of similarity between two pieces of text)
  • Image recognition (heavily used by government agencies worldwide)

Product matching combines all 3 of the above — basically, you have 2 products shown on 2 websites, and you need to establish whether they are a match. Their naming might be similar or not, their descriptions will most likely vary, the images used might also have a degree of similarity, and of course, they both have a price, which should be similar, but not necessarily identical.

Let’s try to elaborate on the following example:

  • Product prices are very similar: 28.75 vs 29.35
  • Product names are also very similar, but not identical
  • Volume is identical (75ml)
  • Product images are difficult to compare because the image on the right is skewed
  • So, is it a match or not? Please be patient, we get back to this question in a minute.

The pretty loose problem, isn’t it? And if you dive into ML aspects of it, not an easy one. Yet — Price2Spy managed to pull it off.

In the words of JF Kennedy — we did it not because it was easy, but rather because it was so difficult!

This is why we decided to share with you the story of this project — I believe it will be a good read both for Machine Learning (ML) enthusiasts and for eCommerce professionals who wonder how their product matching can be done in a more reliable and yet cost-effective way.

Back to our question — the above to products are NOT a match. Basically, Sensodyne has 2 very similar products:

  • Advanced Repair and Protect
  • Repair and Protect
  • (so, very close, but not a match!)

(Part #1) Product matching via Machine Learning — Introduction to the project

In the last couple of years, we have all witnessed the rise of new technology — Artificial Intelligence, or as we in Price2Spy prefer to call it: Machine Learning (ML).

The whole concept was new to us, none of the Price2Spy development crew had any experience with it — but we sensed that it had huge potential, and we were eager to learn.

After a couple of months of courses and theoretical introductions — we asked ourselves — how can we apply ML in everyday Price2Spy operations?

We had several candidate-projects, but one of them was our favorite from the very start — Product Matching.

Not because it was an easy win. On the contrary, it was the hardest ML problem we could think of — but our clients needed it very badly. That meant we needed it as well.

Product matching is an essential part of Price2Spy’s services. To put it simply, with no product matching our client wouldn’t be able to perform any kind of price comparison.

So far, we’ve had 3 ways of how products can be matched:

A) Automatch — fully automated process, applicable when client’s products (and products listed on competitor sites) have something we call ‘unique identifier’ — this can be EAN, UPC, ASIN — or in most general case MPN (Manufacturer Part Number). As you may guess, this method is not always applicable

B) Manual product matching — since humans are performing the matching — it’s always applicable. However, in case a client has 100 000s of products and he wants results real fast — this can be a problem — manual matching is simply not cost-effective enough, nor can it be done at the snap of a finger.

C) Hybrid product matching — is a combination of A) and B) — Automatch provides candidate-matches (which are not reliable enough to be trusted automatically), and humans check if these matches are good (need to be promoted) or bad (will be rejected).

The problem is that Automatch was unable to work with examples like below, where matches are obvious (or nearly obvious) to the human eye, but search on a competitor site does not yield any results

Here are several such examples:

The idea was to introduce a 4th method, universally applicable, which will be reliable enough that it can be trusted. We had a feeling that ML should be helpful, but we had no idea where to start.

But, before jumping on the project, we wanted to check if anyone else did it before us and if the solution was possible in the public domain?

  • Attribute Extraction from Product Titles in eCommercehttps://arxiv.org/abs/1608.04670 — our colleagues from Walmart have tackled the problem which is seemingly similar — but which actually does not deal with matching
  • Product Matching in eCommerce using deep learninghttps://medium.com/walmartlabs/product-matching-in-ecommerce-4f19b6aebaca — in continuation of the above study, this article does deal with product matching, which is what we’re trying do as well. However, it deals with the matching of a single product (while we deal with the problem of matching the whole set of products). To be honest, we got a bit discouraged by the fact that authors himself state that the matching accuracy is between 85% and 90% (we aimed for much more)
  • A Machine Learning Approach for Product Matching and Categorizationhttp://www.semantic-web-journal.net/system/files/swj1470.pdf — this article is helpful, but only if you’re very deep into ML. At the beginning of our project, we were simply not on this level
  • Unraveling product matching in retail with AIhttps://towardsdatascience.com/unravelling-product-matching-with-ai-1a6ef7bd8614 — this article was posted long after we have embarked on our project. Unfortunately, it does not reveal much about the technical details of the ML implementation.

So, we had to start digging ourselves.

This was only a short introduction to this complex topic. Stay with us in order to find more about it in the following posts:

Product matching via Machine Learning — Preparation and Implementation

Product matching via Machine Learning — The Results and Evaluation

Don’t forget to give us your 👏 !

--

--

Misha Krunic
GoBeyond.AI: E-commerce Magazine

Misha is the founder and CEO of an online repricing tool https://www.price2spy.com/ and an advanced data crawling API service https://www.justlikeapi.io/