The Cold Start Problem for Recommender Systems

Mark Milankovich
6 min readJul 14, 2015

--

Facing the cold start problem recommender systems have several methods to overcome the difficulties posed by the initial lack of meaningful data.

The cold start problem for recommender systems

So what is the cold start problem? The term derives from cars. When it’s really cold, the engine has problems with starting up, but once it reaches its optimal operating temperature, it will run smoothly. With recommendation engines, the “cold start” simply means that the circumstances are not yet optimal for the engine to provide the best possible results. In eCommerce, there are two distinct categories of cold start: product cold start and user cold starts. Here we’re going to examine both.

Product cold start

The basics

The right target audience for an advertisement is best calculated by looking at the former visitors for the ad. According to the basic assumption of the collaborative filtering concept, if an ad was already popular with a certain group of people, then others that fit the group’s profile are likely to respond well to the ad.

However, each time a new ad is placed on your site, it goes through the cold start phase due to the lack of valuable user interactions. User actions are incredibly important since these determine the future of both product-to-product and personalized, user-history-based recommendations.

If there aren’t enough user actions for a certain ad to set the foundations for accurate recommendations, the engine will not know when to display this particular ad. So, we could say, that the more interaction an ad has collected, the easier it is for the recommendation system to qualify and target.

News sites, auction sites, eCommerce stores and classified sites all experience the product cold start.

Classified sites often list products based on the date of the post: the newest first. Though there are ads (e.g. ads from babysitters) on classified sites which can be relevant for a longer period of time, in general, the longer a product has been on the site, the less relevant it is. This is simply because old product listings are more likely to be already sold. Just like a weather forecast site, no one would be interested in reading the forecast from three weeks ago.

The problem

Recommendation engines that run on collaborative filtering recommend each item (products advertised on your site) based on user actions. The more user actions an item has, the easier it is to tell which user would be interested in it and what other items are similar to it. As time progresses, the system will be able to give more and more accurate recommendations.

This, however, brings a major contradiction and difficulty to classified sites and their recommendation engines. Even though the newest ads are actually the most relevant ones, a recommendation system has far less confidence in recommending them to your users than it has with older items, but it’s just simply not a good idea to let older ads dominate the recommendation process.

The solution

Content-based filtering is the method that answers this question. Our system first uses the metadata of new products when creating recommendations, while visitor action is secondary for a certain period of time.

Also, we can identify visitors who are only there to browse and those determined visitors who know what they are looking for. For example, if someone clicks on everything from phone cases to real estate within a short period of time, the system will assume she is only there for browsing and won’t use their click history for recommendations.

When it comes to investigating the cold start phenomenon, this is only the tip of the iceberg. Every recommendation solution has a different method to cope with it, and after getting over the rough cold start, the real work of the engine begins.

Visitor cold start

The user or visitor cold start simply means that a recommendation engine meets a new visitor for the first time. because there is no user history about her, the system doesn’t know the personal preferences of the user. Getting to know your visitors is crucial in creating a great user experience for them.

Identifying visitors

Visitors to your site will probably not register upon their very first arrival. It’s actually rather understandable, given the fact that most internet users have already signed up for numerous sites and don’t fancy the idea of going through yet another registration process just to have a look around. What’s more, classified ad sites usually don’t require visitors to be logged in. Your visitors don’t even need to have an account to contact the advertiser and then buy the advertised good. So, your users need to be identified by cookies.

If visitors delete or block cookies, they will be identified as a brand new user each time they visit their favorite classified ads site. The classified site should try to give these users specific identifiers, like unique user IDs. If this cannot be solved, everything the recommendation engine learns about these users will be lost as soon as they leave the site.

However, product-to-product recommendations will persist in any case, whether your users block cookies or not.

A never-ending story

While it’s true that cold start mostly affects new visitors, we must not forget that a similar phenomenon can easily happen with returning users. We shall illustrate what we mean with the help of the following example.

Robert is looking for desks on your site. He’ll be interested in ads for desks for around a week, but after he finds the right one, he will move on. When he visits again in a month, he may be looking for lawn mowers.

So the cold start problem exists all the time, as Robert (and any of your users) will always be interested in new and different things. At the start of each visit, the recommendation system does not know whether the user arrived with new ideas or if she is still looking for the earlier items. This is why it is important for the recommendation system to identify the user’s actual, active interest after the first few clicks. Also, note that certain topics of interest are more persistent than others. For example, collectibles (stamps, memorabilia, coins, books, etc.) exhibit a rather long interest span, compared to typical consumer goods like desks or lawn mowers.

Multiple devices

Even if users allow cookies in their browser, they can still experience the cold start. This can happen, for example, when they use someone else’s computer to browse, or simply if they have multiple devices. Classified sites cannot link their user history from their different devices if they don’t have an account.

The solution

Recommendation systems have an efficient solution for the visitor cold start problem. below are the most important types of information that help minimize or eliminate the cold start phase. With the exception of behavioral information, all of this data can be obtained from both new visitors and returning users.

By default, the very first step is to apply a popularity based strategy. Trending products can be determined by global trends and what’s been popular recently, regionally, or at that certain time of the day.

Then as a next step, you can narrow the selection of ads you display for visitors by making use of contextual information:

Geolocating users with either their geoIP or their mobile device’s GPS co- ordinates.
Knowing the referrer (which site the visitor came from), the device (mobile, desktop), the operating system (iOS, Windows, Android) and the browser type (Chrome, IE, Safari, etc) will help with personalizing the ads you display.

Behavioral information “kicks in” after only 2–3 clicks during the user’s very first visit. This is very important to uncover the user’s actual, active interest.

Is cold start relevant in your business too? What solution did you come up with for it? Tell us about it in the comments!

We will be posting more about classified sites in the near future. So stay tuned and remember to ask us any questions you may have!

This article was originally featured on December 18, 2013, on the blog of Gravity Research and Development, the company behind Yusp. Gravity R&D is a technology expert serving omnichannel recommendations for major clients on 5 continents.

--

--