E-Commerce & Machine Learning

Geke Pals
emptycart
Published in
5 min readAug 10, 2018

In my last post I introduced the problem of product curation and the failure of e-commerce shops to move beyond hard product specs (noise-cancelling, wireless) and provide actually helpful product information (suitable for running), a.k.a. ‘soft specs’. This failure is due to the enormous workload of adding soft specs: manually enriching and updating the information for each and every product in the database. Yet in this age, this shouldn’t be an issue anymore. We have computers, high-speed internet and unlimited cloud memory. And, most importantly, we have Machine Learning.

Machine Learning? Statistics on Steroids!

Machine learning is an AI-buzzword favorite. Companies promote themselves by exclaiming “We use machine learning to optimize our performance” or “Machine learning powers innovation!”, without feeling the need to explain what exactly machine learning is and does. Don’t worry, this won’t be one of those articles that tries to explain the inner workings of machine learning. My aim here is just to make the concept of machine learning somewhat more intuitive and graspable.

Basically, machine learning is statistics on steroids. Statistics is a way to analyze and interpret big chunks of data. It can be used in two ways: to describe the data and the relationship between different variables, or to infer conclusions from the data by using probabilities. In other words, statistics helps us to make sense of the world, and hands us tools to interpret the excessive amount of information available nowadays.

To illustrate, take the striking example of park ranger Roy Sullivan, who was hit by lightning a total amount of seven times during his lifetime. It seems like the man was cursed by one of the Thunder Gods (pick the one you like), but it’s not that extraordinary if we throw a specific statistical phenomenon on the case: the Law of Large Numbers. It states that, with a large enough amount of opportunities, anything can become likely to happen. So, given the large amount of people on earth, the large amount of time we have, and the large amount of lightning strikes, it is actually quite likely that a lightning streak of seven times will happen to someone sometime. It was just bad luck for Roy Sullivan that it happened to be him.

So, back to machine learning. Machine learning is able to “learn” from chunks of example data by using statistical techniques, without being told explicitly what to learn. After learning, the ‘machine’ can make predictions on new (but similar) chunks of data. This means that you could throw data at a machine learning algorithm, wait for the learning process to complete, and check if the predictions it produces make any sense. It’ll identify clusters, classify new data into classes, or visualize multi-dimensional data in a two-dimensional graph that makes sense to us humans. It’s all some pretty cool stuff.

Source: https://xkcd.com/1838/

What should Machine Learning do for product curation?

As discussed, the biggest challenge in adding soft specs to a product database is that it‘s a manual, product-by-product process. This is the workload that we want to shift from our precious human hands to an intelligent machine learning brain. The problem here isn’t the part where we automate adding an extra spec to each product, but knowing which value this spec needs to have.

I previously mentioned that one way to tackle this would be to design an expert rule that comprises all products in the database. However, this solution is problematic, because it forces the expert to design invincible rules that classify each product as belonging to the right soft specs. So why not let the computer design such a rule? As I’ve just explained, machine learning is able to learn by example. So, our algorithm should be able to learn which product belongs to which soft specs, by examining a set of example products. In other words, machine learning could discover what rules we implicitly think of when we manually categorize products into soft specs.

Let’s take a look at an example. Remember how I was browsing Amazon for a pair of headphones that I could run with and use while playing my electric piano? It would be impossible to design a perfect rule that categorizes each relevant headphone in the ‘run-and-piano’ spec, and it would be terrible to manually classify each headphone (with a total of over 10,000 headphones!) individually.

Over 10,000 headphones to classify manually…

In order to have machine learning take over this work, we’d need to hand the algorithm a small set of about 20 headphones that are classified as ‘yes, these are run-and-piano’-headphones, and an equal set of headphones that are classified as ‘no, these are absolutely no run-and-piano’-headphones. We’d then let the algorithm do its magic, and several minutes later the complete dataset of headphones would be classified into either run-and-piano or not.

Of course, the inner workings of machine learning are a bit more complex. There are many types of algorithms to choose from, so a lot of testing is required to select the right algorithm. We also have to make sure that the data we put in is in the right format for the algorithm to understand. This means converting messy real-life data to clean, equivalent data. Lastly, we have to find a way to take free text data (such as product descriptions) into account, which means finding a way to convert text into numbers (computers love numbers!). There’s much more to say here, but by now we should have a basic understanding of the general workings of a machine learning algorithm for the problem that we have.

Source: http://www.sakhaglobal.com/index.php/2018/02/26/big-data-and-machine-learning-the-perfect-marriage/

The only thing missing now is an answer to several questions: what did the algorithm do? On which basis did it select whether a product belongs to run-and-piano or not? How can we know if the algorithm classified the products in the right way? This is what I will answer in the third, and final, blog of this series:

  1. Introduction: The Problem of Product Curation
  2. E-Commerce & Machine Learning (you are here)
  3. Opening the Black Box of Machine Learning: let’s see what’s happening

If you enjoyed this article, please hit those little 👏 below (you can “clap” multiple times!). Want to know more about our curation system? Find us at meetfeli.com.

--

--