Building a new Search Platform

How we’re improving the search experience at mobile.de

--

For the last two years, I’ve been working in the Data team at mobile.de, Germany’s largest online vehicle marketplace, which is part of the eBay Classifieds Group.

This article provides an overview of one of our strategic initiatives — building a new search platform to align the interests of users and dealers on the Search Results Page (SRP).

A relevance-sorted SRP can cater to user interest while still displaying sponsored ads that are chosen and positioned based on relevance. A more primitive solution would assign a stable spot on the SRP for sponsored products, usually disregarding their relevance for the user.

The goal of the project was to improve the user experience for search and relevancy in comparison with other sorting approaches. As the main metric for measurement, we chose the median position of the first clicked ad on the SRP, which had to be improved without dropping conversion metrics.

Starting point

Search is a fundamental feature of the mobile.de platform, which has had a clear logic for decades and provided a wide range of options to filter listings by almost any car attribute, but only offered a few choices for sorting based on price, mileage and time.

This simple solution has advantages: users are sure they’re not missing the cheapest deals and dealers know what to do to get to the top of the search results — just offer the lowest price in the category.

The biggest disadvantage is that from the search results alone, users are never really sure they’re getting a good deal. Maybe a car is cheap because it has less features, is damaged or has some other kind of flaw. Often, a careful reading of the description and comparison of car features shows that the cheapest option is not relevant and even its value/price ratio is far from the best. At this point, users had to dig deeper into the details of the top 20 search results, which sounds a bit weird in an era of Machine Learning and smart recommenders.

Price labeling

Our main improvement was to introduce price labels for ads.

With a huge car inventory containing over 1.5 million ads, we could use historical data for prediction of price for new ads on our platform. As one of the first big data projects at mobile.de, we not only had to train the model itself, but design and build the infrastructure to train and use the model.

We built the pipeline to store historical data to Hadoop and the infrastructure to train the model based on the data. (The details on how we built that model would be enough for a separate article). As the final step, the model is uploaded into cloud storage to make it available to our services.

The price evaluation flow is quite straightforward. The database is owned by one service and all write operations go through it, so the service has all the necessary information for price evaluation.

The database service sends a REST request to the price evaluator with all ad data packed as JSON, and receives a response with an approximated price and a label.

All the “black magic” takes place inside the evaluation service. On startup, it loads the model from the cloud into memory. In recent iterations, the model grew to a few gigabytes, which brings essential penalty up to two minutes.

To prevent downtime on updates, we use a “blue-green” deployment strategy. The evaluator service contains a lot of tunable ETL functions to convert the JSON body into a format accepted by the machine learning model.

After optimization, the price prediction time was reduced to 2–5 ms per ad.

As a final step, the evaluated price is written to the database and indexed into Elasticsearch, allowing users to see price labels on the search results page and filter by them.

Now on the price-sorted search results page, users see that a few of the first (cheapest) cars are actually not the best deals, but at some higher price we have a great offer. At this point you may have a brilliant idea — let’s just sort all the ads by the price rating and be done! Best search solution ever! — or no?

Solutions based on the Price Ratings take the listing itself into account, and some dealer details but lose one important point — the user’s preferences.

Some listings may be an outstanding value for the money, but are just not what the user needs. No one looking for a limousine would buy a Mini or a Smart car just because it’s a good offer. A good recommender system must take into account not only what it recommends, but also to whom. The price rating is important, but it is only one of the aspects on which the final recommendation should be built.

Collecting user information

To give a good recommendation to the customer, we should first learn a bit about what they’re looking for. For this purpose, we have implemented a tracking and profiling solution, which builds user profiles based on ad view and contact events. This user profile service is also worth an article of its own — but for the moment it is enough to know that it works in real time and provides information about the distribution of the main features of the cars the user viewed.

For example, if a user spent 10 minutes on our platform and we got the following distribution for his preferred car makes: 50% BMW and 50% Audi and preferred car colors: 80% red and 20% green. From now on we can assume certain preferences regarding manufacturer and color, so we can weight these parameters in the Elasticsearch query to provide more cars that match the user’s taste.

User preferences are applied to the query via Elastic Function Score Query.

"functions": [
{
"filter": { "match": { "color": "RED" } },
"weight": 3
}
]

In this case, ads that match the score functions get extra weight and move up higher on the results page.

The main challenge of this approach is finding appropriate weights for each parameter. For one customer the color is most important, for another the engine type and for a third the price-value combination might be central. This means we can’t just go with a “one-size-fits-all” solution. Plus, after adding the first ten features, we faced a so-called combinatorial explosion, meaning that every feature we add to the equation introduces much higher complexity and requires much more manual tuning. The personalization approach showed good results, but we were still limited by a maximum amount of parameters and the overall complexity of the solution.

Result set re-ranking

Releasing a personalized search solution was a big step that changed the price sorting approach that had been used for over 20 years. At this point, we needed new ideas and a strategy for how to continue improving the platform.

It was obvious that a solution based on weighted Elasticsearch parameters wouldn’t scale endlessly for a higher number of parameters and we knew we were approaching the limits. To go further, we had to decide what to optimize for.

As a measure of ranking quality, we use so-called Discounted Cumulative Gain (DCG). The main idea is simple: each item in the search results set is assigned a relevance score; the total sum of relevance scores of the result set provides insight on the relevance of the set — does it contain more or less relevant information when compared to other result sets? We use the Normalized DCG to measure the relevance of each set. In this model, the score of each item is normalized by its position — so the more relevant each result is, the higher its NDCG. The ranking is complete when all items are sorted by relevance.

What can be used as a relevancy score for each listing? This is still an open question. Some companies just hire people who manually mark results in the set or at some point ask users how relevant each item is. This is an expensive and time-consuming operation, especially when taking into account that for a machine learning model, we would need at least a hundred thousand such data points.

With user tracking data, we could use click events as users navigate from search results to the details view of each listing. In this case, we are actually predicting the probability of click rather than pure relevancy — but the assumption is still quite valid.

The advantage of this approach is the presence of a strong learning signal — click events and a huge amount of historical data for model training. So we could train a complex machine learning model, with probability of click as a function of ad information, estimated price rating, seller information, user profile and any other information that we find relevant.

One open question is the productivity of the model. Because it is computationally expensive, it can’t just be applied to each document in the index. Instead, we use Elastic Rescoring to apply the model to the top N results. To prove that productivity is acceptable and tune the performance via the size of the rescoring set, we wrote a Gatling load tester and produced a series of diagrams.

Testing the LTR model in the first iteration showed that using Learn-To-Rank models added around 30–40% to our response time, growing p50 from 22 to 34 ms and p99 from 41 to 56, which is an essential but acceptable value.

At the same time, we experienced a low penalty when growing the rescoring set size up to a few hundreds, which is more than sufficient for our use case.

The big picture

Now that we’ve looked at the main parts, let’s see how they fit together.

When a new listing is created or edited, we calculate its “recommended” price and add the appropriate price label, which is then indexed into Elasticsearch.

When a user clicks search on the page, we build an Elasticsearch query based on the chosen filters, and begin building a profile of the user to apply their preferences as weights where possible. For example if they prefer red BMWs but now filter for Audi, we will not push any BMWs into the search results, but still add more red cars. As the second step of the query, we add a rescoring function, which will be applied to the best N results to sort them by the highest click probability.

As a result, Elasticsearch returns the most relevant ads, sorted by the highest click probability. The results are then sent to the frontend for presentation.

Summary

The solution described here is definitely not yet final, but it’s an important milestone in the development of our new search platform at mobile.de.

Building this new version of search at mobile.de was a big technical challenge that included a number of sub-projects, each of which are strategic company initiatives that deserve dedicated articles of their own.

The solution had to pass strict productivity requirements and demonstrate a marked increase in relevancy to convince users to switch from the old price sorting method. It was a team effort by ten engineers and others who developed the price prediction, user tracing, realtime user profiling and many other related pieces of the project.

Recommended reading: For more information on how we improve vehicle recommendations at mobile.de, see “Deep learning for recommender systems”.

--

--