Launching Similar Homes and Real-Time Personalized Recommendations

Combining Deep Learning and Heuristics to Enhance Online Home Search and Discovery for Buyers, Sellers, and Agents

Published in

Compass True North

13 min readSep 24, 2019

By Gautam Narula, Ran Ding, Samuel Weiss, and Joseph Sirosh

Compass recently launched a set of AI-driven capabilities: Similar Homes on listing pages, and real-time personalized Recommendations on the homepage. In this blog post, we’ll share our process for developing and launching them.

Here’s a quick visual look at these new features:

Figure 1: Similar Homes carousel on listing pages.

Figure 2: Similar Homes card display at the bottom of listing pages.

Figure 3: Personalized Recommendations on the Compass homepage.

Try it out! Go to Compass.com, search for a home to buy, and browse a few listings. Then, return to the homepage and refresh to see your recommendations. With each new listing you browse, your homepage recommendations immediately update. Each of those listing pages has a carousel where you can click through to nearby Similar Homes. Our mission at Compass is to help everyone find their place in the world, and we hope Similar Homes and Recommendations will help you do just that!

The response has been overwhelmingly positive: a 153% increase in homepage clickthrough rate and a 107% increase in engagement actions (such as sharing or saving a listing, or contacting the listing agent) on pages referred by these features.

Motivation

Our goal with Similar Homes and Recommendations is to surface relevant, personalized results that help users discover their perfect homes with ease and serendipity. It is the first of a series of AI-powered features that we are launching as we build the next generation platform for real-estate.

In addition, our team’s role at Compass is not only to build specific AI features and products, but also to enable AI features to be easily used by the rest of the company. When we build a fundamental feature such as this, we want to make it a core primitive with a reliable hardened API that can be used by a variety of internal systems. To that end, we designed Similar Homes and Recommendations as microservices which enable other teams to easily incorporate these capabilities into their own product areas.

Model and Service Development

Learning Similarity from User Behavior

Similarity learning is one of the fundamental concepts of machine learning. It’s used in many common applications such as search ranking, clustering, and recommender systems.

Without data, there are many ways one can define a distance or similarity measure, such as the cosine similarity of raw features. While this type of unsupervised approach is simple to implement, the choice of similarity measure may be arbitrary and may not reflect how users actually interact with the items — in our case, listings.

In our setting, when users explore homes on Compass, they implicitly provide feedback on listing similarity. A user viewing two listings in quick succession offers some evidence that they are similar to each other. When aggregated over the behavior of lots of users, anonymized listing browsing data offer an informative similarity signal we can use to train our Similar Homes algorithm.

Similarity as a Ranking Problem

One initial approach to learning similarity could be to train a binary classifier to classify a pair of homes as “similar” or “not similar”. However, such absolute similar/not-similar labels are hard to obtain. More often, we can more reliably infer relative similarity. Listings viewed in quick succession are more similar than listings viewed far apart, because they likely express the same user interest. However, we don’t know the magnitude of that similarity. Therefore, we frame this as a learning to rank (LTR) problem with the goal of presenting users with the most similar nearby homes, ranked in descending order of similarity.

In the setting of LTR, there are three choices of learning objectives and corresponding training loss functions: pointwise, pairwise and listwise. Pointwise loss reverts back to requiring absolute similarity labels, which we want to avoid. Pairwise and listwise loss functions often achieve state-of-the-art results, with pairwise loss being more straightforward to implement and friendly to small-batch training.

Data Preparation and Model Development

The “pair” in pairwise loss is a positive (similar) and negative (dissimilar) example.

Figure 4: Getting positive training examples from anonymized user listing view history. Adjacent listings in the chronology are considered “similar”.

Since this pair exists within the context of a given subject listing, which we refer to as the anchor, a straightforward way to organize our training data is in the form of triples, containing an anchor, a positive example (listing viewed immediately after the anchor), and a negative example (a random listing).

Figure 5: An example training triple.

With our data organized into training triples, we then needed to determine features that would be useful for learning an effective similarity model. Our listings have hundreds of fields containing structured (e.g., numerical and categorical values) and unstructured data (e.g., text descriptions and images) from which we can derive many such features. We first incorporated intuitive ones such as geographical proximity and differences in number of bedrooms/bathrooms, price, and square footage. We also considered more complex features like embeddings of images and descriptions, architecture style, amenities, distance/commute times from major geographical features (beaches, lakes, parks), year built, nearby school(s) quality, etc.

With our features selected, we took our training triples and split each of them into two pairs: the anchor and positive example, and the anchor and negative example. Each pair was then run through our similarity model — a deep neural network implemented with MxNet. We used ranking Hinge loss as the training objective function. Hinge loss is a popular and well-established pairwise loss function, but there are several other viable choices. See Ranking Measures and Loss Functions in Learning to Rank (2009) for a broader overview. For further context on how this training works, see Learning to Rank Using Gradient Descent (2005), a seminal ML paper that introduced the RankNet algorithm.

Figure 6: Our model training and loss function. Each training pair (anchor + positive example and anchor + negative example) is run through our model and the output scores are run through the Hinge loss function. m is an additional loss margin that encourages the model to discriminate between positive and negative examples by at least m.

To evaluate model performance, we benchmarked model predictions on a hold-out test set of listing browsing data as well as a dataset of similar homes prepared by Compass agents. Using both allows us to evaluate not only on inferred similarity from user behavior but also on handpicked similarity from domain experts.

Beyond Learned Similarity

In addition to a learned similarity model, there were several other important considerations that extend beyond purely optimizing for similarity.

Heuristics

The first rule in Google’s Rules of Machine Learning: Best Practices for ML Engineering discusses the value of heuristics, primarily in the absence of sufficient data. But we nonetheless have found heuristics helpful even when we had a large amount of data, especially in early releases of machine learning services.

Heuristics help safeguard our model predictions by providing guardrails against anomalous outcomes. For instance, we have heuristics on basic features such as bedrooms, bathrooms, and price to enforce an absolute delta between the anchors and the candidates on those features.

Heuristics have also been useful for integrating metrics and signals that are sparse for the model to effectively capture, such as listing quality measures. We use several markers, including listing freshness (recency of major listing updates), popularity (unique visitor view count), and quality (image and listing content quality scores). While we could integrate these signals into the model, their scarcity led us to represent them as separate heuristics in order to emphasize them in the initial release of the algorithms.

Ranker-Based Ensembling and Extensibility

With our similarity model in place and our heuristics selected, we needed to make a few design decisions on how to combine them into one cohesive Similar Homes model. This, in general, is referred to as model ensembling.

The approach we took is to treat each model, either heuristic-based or learned, as a ranker which produces a rank. By using ranks, we avoid dealing with numerical scores produced by each individual component that may vary wildly in range and could be less robust for model ensembling should underlying numbers change. The specific implementation of our ranker ensembling formula involves taking the inverse of the rank output by each ranker, applying a sublinear transformation, and then calculating a weighted average (based on the relative importance of each subranker) as the final model score.

One side benefit of ranker-based ensembling is that it is easily extensible. Since this approach is agnostic to ranker type, new rankers can be either additional learned models, heuristic/rule-based, or some combination. As we iterate on this feature, we can easily add new subrankers and change the weights of existing ones. This makes it easier for us to incorporate new features and heuristics as we study key metrics and receive agent and consumer feedback.

Figure 7: Our final model: a learned similarity ranker combined with heuristic rankers.

Extension to Personalized Recommendations

Similar Homes is listing-based discovery, where every user is shown the same set of Similar Homes based on the listing they are currently viewing. Recommendations are personalized discovery, where we recommend items based on each user’s unique interests. Much of the architecture that we built for Similar Homes can be reused for this application as well.

To make recommendations, we can represent a user’s interests by their recent listing view history. This is analogous to Similar Homes but operating on a set of listings viewed by the user in a session, rather than on one anchor listing. However, there are a few differences we need to consider. First, a user could have a very long list of viewed listings. Secondly, a user could be viewing homes in different neighborhoods, or even multiple cities, effectively expressing a diverse set of interests. Finally, views carry different weights: a recent view is more informative of a user’s current focus than an older view. Our Recommendations implementation accounts for these nuances.

The approach we took is conceptually illustrated in Figure 8. We first apply a clustering algorithm that creates “interest clusters” from a user’s view history. We then create a synthetic anchor representing each cluster. The Similar Homes algorithm is then run for each synthetic anchor. Finally, we merge the results and present those Recommendations to the user.

Figure 8: An example of generating clusters and synthetic listings from a user’s view history. Each synthetic listing is used to fetch candidate listings which are then sent to the model to be ranked.

The approach described above is known as content-based filtering in the recommender system community. The other popular approach is collaborative filtering. Collaborative filtering determines what to recommend based on user co-views. It’s often seen in “users who viewed this also viewed…” applications, and requires items to have significant user actions on them before they can be reliably recommended. This requirement, known as the cold-start problem, is a major hurdle for the thousands of brand new listings that appear on Compass every day.

On the other hand, content-based filtering determines what to recommend based on attributes of the items (in our case, listings) and a profile of the user preferences (in our case, represented by a user’s listing view history). This approach effectively overcomes the cold-start problem as long as we have access to the attributes of any new listing.

We are certainly not limited to using view history to represent a user’s interest/preference. In future iterations, we will also consider their searches, favorites, and other interactions with the listings. In addition, due to the extensibility of our ranker ensembling framework, we can combine collaborative with content-based filtering, leveraging the strength of both.

Service Development

Real estate is a time-sensitive business. In Similar Homes and Recommendations, we want to be able to recommend any of the thousands of new listings added to Compass daily, and immediately reflect updates to existing listings. In Recommendations, we also would like to immediately react to a user’s changing interests.

The screencast below illustrates this. A user first views listings from the Boston area and receives recommended listings in the same area. Later on, as the user views listings from New York, our recommendations include results from New York immediately.

Figure 9: The current homepage experience with Recommendations.

Compass’ microservice-oriented backend architecture allows us to build Similar Homes and Recommendations as online services that produce predictions in real-time, reflecting the most timely listing information and user preference changes. Additionally, by architecting Similar Homes and Recommendations as microservices, we can easily expose their functionalities through APIs to other product areas at Compass, such as search, agent-client collaboration tools, mobile apps, and many more.

When the Similar Homes and Recommendation services receive a request, we featurize the anchor listing(s) and issue a search call to retrieve an initial pool of candidate listings based on business logic around those features. We then send the candidates to our service, which runs them through our model and returns them in ranked order. A more detailed execution flow is illustrated in Figure 10.

Figure 10: Execution flow of Similar Homes and Recommendations.

The real-time nature of these services creates stringent requirements on service latency. To improve on latency we focused on a few key areas: (1) simplifying search calls and reducing the overhead in populating the details of the listings in the result (2) reducing the complexity of the online model and moving expensive computation offline (e.g., pre-compute and cache extracted features).

To get our service up and running quickly, we deployed the core models through Amazon SageMaker, and set up a Kinesis Data Firehose for logging our predictions for offline analysis. A light orchestration service layer is added to interface with all Compass internal microservices, such as search, compliance, and client applications. The orchestration service is deployed in Compass’s backend ecosystem, managed by Kubernetes, which allows us to iterate quickly. Deployments and rollbacks are on the order of minutes!

Testing, Deployment, and Launch

With our Similar Homes and Recommendations microservices ready to handle requests and return predictions, it was time to begin the process of rolling them out to our users!

Testing and Debugging

We began testing on internal demo sites and stress tested the services. We also collaborated with Compass agents to improve the quality of our results. Their domain expertise led us to consider a broader set of features, in order to improve our recommendation diversity and quality.

We created an internal “debug” endpoint which returns additional information such as individual ranker scores and performance data. We also mulled over an internal “what if” endpoint to enable us to easily run “counterfactual” comparisons. These endpoints were helpful for debugging and improving explainability.

Rollout and Launch

After completing internal testing, we rolled out Similar Homes and homepage Recommendations to the public, beginning with 1% of traffic and steadily ramping up to 50%, where we paused for A/B testing.

How would we know our customers are finding these features useful? We instrumented two key user engagement metrics: (1) The number of users viewing listings, i.e., increased click-through rate on recommended listings (2) The actual engagement with our listings i.e., once our users opened the recommended listings, they are more likely to take actions on them such as save, share, or contact an agent.

We were indeed able to observe substantial lift in both metrics. At the end of our testing period, we observed a statistically significant 153% increase in recommendation click-through rate and a 107% increase in engagement actions rate on pages referred by these features.

After concluding A/B testing, we officially launched Similar Homes and homepage Recommendations to 100% of users on August 13, 2019!

AI applications such as Similar Homes and Personalized Recommendations help Compass become a smarter real estate platform for buyers, sellers, and real-estate agents. We are taking the lessons from our initial launch experiences and user feedback to develop the next set of innovations for our customers!

Acknowledgements

One of the Compass Entrepreneurship Principles is Collaborate Without Ego, and in that spirit the AI team offers its thanks to all the partner teams we worked with, including the teams that own our homepage experience, listing pages, search, and infrastructure.

Thanks to George Valkanas, Foster Provost, and Panos Iperotis from the AI team for reviewing drafts of this post and providing feedback.

Thanks as well to the rest of the engineers on the AI team who contributed to the successful launch of Similar Homes and Recommendations: Luran He, William Horton, Bharath Swamy, and Wesley Zeng.

We’re Hiring

I (Gautam Narula) started at Compass in June, and began writing code for this project on day 1. Two months in, with the help of all the wonderful ML engineers and ML scientists on the AI team, the features I contributed to are on every listing page and the Compass homepage! If that kind of speed and impact sounds appealing, join us!

Gautam Narula is a Machine Learning Engineer on the AI Services team, Ran Ding manages the AI Services team in NYC, Samuel Weiss is a Staff Machine Learning Engineer on the AI Services Team, and Joseph Sirosh is the Chief Technology Officer of Compass.