The Core Value of Recommender Systems for Successful Internet Enterprises

Maciej D. Korzec
The Startup
Published in
8 min readOct 31, 2020

An assessment of the importance of recommender systems

(Source: Combined Unsplash sources @jankolar and @franki by the author)

Most users are aware of the existence of individualized recommendations on large internet platforms, e.g. by using Amazon and noting the You might also like features. However, the importance and dominance of this technology within big data platforms is maybe more crucial than many might think:

70% of the 30 largest internet companies use recommender systems within their core business.

During the preparation for a talk on the Netflix architecture I learned that nearly all the features on the streaming platform are individualized. It aroused my interest in this topic in general. Netflix tries to recommend the most relevant movies, series, advertisements, and overall page to the users. From a set of dozens of images from one movie it even selects one that is most likely clicked. Netflix has a great tech blog, where you can find all kinds of information about the platform.

My interest in the technical details grew, and it resulted in a series of articles about recommending similar images with help of convolutional networks (part 1, 2, 3, 4). The question arose, how prominent recommender systems are for large internet companies.

This topic is addressed today. For this article the 30 largest internet companies by revenue were analyzed with respect to usage of recommender systems, and we can note that

  1. successful internet companies rely on the cloud and big data catalogs
  2. to compete with other players, and efficiently use data, they must use recommender systems whenever applicable — mostly based on machine learning

Let us continue with the results of the evaluation before giving an idea what recommender systems are, explaining their competitive role and concluding with a discussion on the importance on the core business of these companies.

The top 30 internet companies

For the following analysis, the 30 largest internet companies by revenue were considered. The revenue numbers are mostly from 2019, so that the Covid19 effect is not reflected — e.g. Expedia’s revenue dropped drastically since then.

Let us first have a look at the distribution of these companies by country:

Top 30 internet companies country distribution (Source: Diagram by the author, numbers from this list — 19th of October 2020)

What strikes most is that only 7% of these companies (2) are from Europe, the portion is unchanged when considering the top 100 — it reflects the failure in digitalization of a whole, developed continent — and that the core players in AI, USA and China, indeed benefit from this expertise.

The next diagram shows which of these companies rely on recommender systems within their core business (major revenue stream):

Usage of Recommender Systems within the top-30 internet companies (Source: Diagram by the author, top 30 companies from this list were evaluated— 19th of October 2020)

There are companies like PayPal, where recommender systems are not crucial for the overall core business. However, the majority, products with huge user bases, data, or catalogs, need the technology to fully benefit from running a big data platform. And these systems are driving internet companies nowadays. Hence it is not surprising that 70% of the companies incorporate recommender systems within their core revenue stream.

You can find the table with the companies and assessment in the Resources section at the end of the article.

How recommender systems improve market shares

When Petabytes of data have been collected, only one point is certain: The data accounts for significant costs for storage and computations.

Everything else depends on very many business and technical expertise factors. The successful data-related companies from the top-30 list do have a solid business plan and the expertise to harvest the data. Those with a huge catalog are all able to benefit from the long tail.

One should not forget the business goals when working with big data. Therefore, let us have a look at two citations from Peter Thiel’s book on success for start-ups [1]

“Monopoly is the condition of every successful business.”

“A good business takes advantage of the complementarity between computers and humans, using both machines and human employees to solve problems.”

This is exactly the complementarity recommender systems are offering, computers create possibly good selections, and users decide if the computer made indeed a good guess — which it often does.

In a traditional offline shop, one tries to offer the most popular items, to maximize the number of sales

The long tail (source: the author)

Consequently, the less popular items are never sold. On platforms that can work with a huge warehouse (digital or analog) not offering the individually most popular items can lead to churn or non-conversion.

Recommender systems have the goal to find individual interests (individual popularity). Then the users also see only a fraction of the whole catalogue, but it is hopefully the most popular fraction in each user’s eyes. In this way revenue is created from the long tail depicted above, the overall not so popular items. The probability increases that a user does not churn. If he can find a satisfactory item, he is unlikely to look out for other sites.

The explanation of the importance for preventing churn is described by the following, simplifying calculation: If you have two platform competitors with the same catalog and strategy, but the first one has a much better recommender system, it is quite sure that users will tend to churn less from the platform with the good recommender.

Assume:

  • Companies A and B have both 100 units of customers
  • Probability to churn within a year A to B is 10%
  • Probability to churn within a year B to A is 20%

To calculate the distribution after n years without taking into account any other effect, consider the calculation

After five years (n=5) this gives:

  • Customer units of A: 128
  • Customer units of B: 78

Hence by considering churn alone with one competitor, A wins the market by a significant margin and starts creating the desired monopoly. Now B can try to get more new customers, but the cost of customer acquisition is much higher than the cost for keeping a customer — in the long run the battle will be lost as A generates more cash flow and adds additional positive effects. If all other competitors together stay at 100 units and have a 20% yearly churn this is even better for A.

This is the reason why Netflix has always been happy with a <10% churn rate while the average for over the top content is estimated to be around 20%. Feel free to check-out their nonlinearly growing subscribers numbers here, approaching the 200 million mark in the end of 2020.

Of course, content is key, without it, no recommender in this world will help and no monopoly will be established. But you may fail with the best content, as this is just essential programmatic marketing.

In general, recommender systems are divided into collaborative, content based and hybrid systems, and based on the use-case of the platform one needs to decide what fits best.

Main types of recommender systems (source: the author)

Example for a collaborative recommender system:

Whenever you read phrases like users who bought x also bought y, a collaborative recommender system is running. It tries to find a neighborhood within the set of all users that has a similar interest in items that you have. It relies on the collaboration of the users, hence the name.

Example for a content-based recommender system:

Recommending the most similar images from a large set of images, based on machine generated similarity is an example for a pure content-based recommender system. I wrote about it in my first post on Medium Effortlessly Recommending Similar Images and three follow-up articles that resulted in a demo app that you can access here.

Example for a hybrid recommender system:

The recommendations on Netflix are hybrids. The system uses all it can get, metadata of the titles, individualized images (content based) and other users’ behavior and taste to suggest movies and series that the user shall click.

A/B testing

How do you measure success when there are so many degrees of freedom? After a data scientist created a recommender system that works with some sample data, it is still not guaranteed that it gives statistically relevant benefit in production. Carrying out A/B tests (or A/B/C/D/ …) is hence mandatory, starting with a metric for success that all stakeholders can agree upon. This essentially means that software with different versions is running for different user groups to evaluate the performance of different recommender systems.

A platform should use a microservice architecture and log clicks and conversions. Then certain users can be forwarded to the application with the old version of the recommender and other users can test the new version. If after some time (this can be several moths) the evaluation shows a significant improvement, the new model can be deployed to all users. Scalability may become an issue, if for example, a collaborative model works on all >100 million users.

Description how A/B tests may be carried out (Source: the author)

There is literature on recommender systems available [2] that also covers many more sub-types of recommender systems, if you want to gain deeper insights, it is recommended to go through such complete resources.

Takeaways

Nowadays it is a necessity to use recommender systems to become a big player in the platform economy. All internet companies with large revenue have them if it is applicable. Companies with good content employing this growth potential will always take over the lead over those ignoring the long tail. They benefit from customer binding and better churn and conversion rates. Hence, when thinking about a new platform, one has always to ask the question

Could recommender systems be employed to improve conversion rates?

If the answer is yes, a high priority should be set to derive a scalable concept, a microservice architecture, a suitable analytics approach, an A/B testing idea, and its implementation.

Resources

[1] Peter Thiel, Zero to One: Notes on Startups, or How to Build the Future, 2014

[2] Charu C. Aggarwal, Recommender Systems: The Textbook, 2016

Table of top-30 companies with recommender system evaluation:

Top 30 internet companies by revenue (mostly 2019) and assessment of recommender system usage.

Thanks for reading!

If you found the read interesting, you might want to catch up on some of my previous articles:

--

--

Maciej D. Korzec
The Startup

Data Science Enthusiast, Product Management Devotee and Mathematician at Heart