Predict your rent with Machine Learning

Behind RentRadar, the new AI tool by HousingAnywhere

Massimo Belloni
Data Science @ HousingAnywhere
9 min readNov 28, 2019

--

Just some 10 years ago, it was pretty common to rely on people for the answer to any question of general knowledge. Do you know where I can find a plumber? Do you know at what time the shops close today? And the list goes on. Today, it is indeed very common to rely on the voice assistant of your device to receive the answers to these questions instead. We would expect “Google it” to be a solution to pretty much anything. But is it? Have you ever tried to Google what a fair rent price is for the city where you live or in which you plan to live?

Have you ever tried to Google what a fair rent price is for the city where you live or in which you plan to live?

Results range from Quora threads to summaries of some expensive market studies or, most often, they include ads of rental websites. The lack of transparency that still currently remains in the rental market is astounding, especially for those who are in the process of relocating. The same applies to landlords and accommodation providers. How do they know how to set the price appropriately so that they’re always booked, yet are not setting the price too low? After speaking to several account managers here at HousingAnywhere, we came to the realization that this problem is very real for small and large providers alike, with no answers in sight.

Therefore, we decided to design the solution ourselves, setting off on a path to produce the first (truly) smart pricing algorithm for the rental market. We just didn’t want to create yet another market research article, or simplified calculator. We wanted to create a tool that would tell users what the fair rent of a certain listing (room or apartment) would be, in a certain city, with certain amenities, at any given time.

We wanted to create a tool that would tell users what the fair rent of a certain listing (room or apartment) would be, in a certain city, with certain amenities, at any given time.

In other terms, we set the goal to distill the empirical knowledge of a local, experienced market practitioner and make it available to anybody. Better yet, we aimed to distill not just some empirical knowledge, but the essence of the market dynamics of a specific city at this point in time.

We sit on a large pool of data comprised of not just successful market transactions but, more importantly, unsuccessful ones. Researches and analyses have the implicit bias of using the winners (successful transactions), but neglecting those that fell through the cracks. On the other hand, our databases have recorded a multi-year history of interactions between demand and supply, both successful and unsuccessful, across a large pool of European cities.

After months of training city-specific machine learning models and a few more months of internal testing and validation, we can now say that we succeeded in developing one of the most impressive tools that the rental market has ever seen, and we are proud to release it to the public: the RentRadar.

RentRadar

Collecting data and making assumptions

HousingAnywhere is a self-service platform, and our goal is to give our landlords the (tech) tools they need to run their businesses. As we built our product with a global vision, we have listings being published pretty much on every corner of the globe. Being self-service also means that we replaced human-driven vetting with an advanced apparatus of tech products to guide advertisers toward the creation of high-quality listings, as well as to prevent bad agents (scammers). While a human-driven assessment of the quality of a listing will also be limited by personal biases, our pool of market interactions offers a much wider dataset to gauge market sentiment.

Our hypothesis is that, generally speaking, the fair market value is the price that both the buyer and seller would agree on, based on the information with which they have both been supplied. In a marketplace like ours, where we have plenty of good supply, if the price doesn’t match the quality of a property, it simply doesn’t get booked. Easy as that. At the same time, if the tenant doesn’t meet the landlord’s requirements, the booking also doesn’t take place. By restricting our training to all the listings receiving at least an invitation to book, we managed to collect representative prices, which were aligned with the latest market fluctuations.

Every listing on our platform is described by a set of features (location, size, facilities, etc.) that have a direct or indirect correlation with the final rental price. Not all of the features contribute equally to a solid model as, from a machine learning perspective, their importance changes from city to city. We interviewed our Account Managers to discover which features, based on their experience, most influenced the final rental price. Having air conditioning in a rental space is very important (and expensive) in Milan, but you won’t see requests for AC in cities like Rotterdam or Berlin, for example. This kind of exhaustive search over the set of features could have been performed in an automatic way, but it would have required too much time and effort.

This kind of exhaustive search over the set of features could have been performed in an automatic way, but it would have required too much time and effort.

As can be easily guessed, though, the most important feature for a property (and the one that drives its price the most) is its location. As you can imagine, simply using latitude and longitude without the specific context of the urban area is too raw. Therefore, we enriched our data with information coming from external sources to compute distances to important locations, such as metro stations and universities. With the help of our Account Managers, we mapped each city as a grid of POIs, thus keeping in mind what would be considered most relevant for the demand we served. That proved to be successful, as we noticed prices that fit into these partitions better than a more canonical definition of neighborhoods.

Boosting trees for regression

As in every supervised learning regression problem, we had to define a target variable. To adjust its distribution and to take into account different market dynamics, we chose to use the natural logarithm of one, plus the sum of monthly rent and bills (when not already included). We decided to start training the models for cities where we had the most data and expertise. Each dataset ranged from hundreds to thousands of listings, and each one was described by at least 50 different features.

The type of data at our disposal led us to use boosted decision trees for the task. We used LightGBM and XGBoost, each time basing our choice on the size of the dataset and its particular type of features. In our team, we already had some successful implementations of these models, for example in helping us keep our platform scammer-free. The prediction is determined by the sum of the weights contained in the leaves reached by the decision processes carried out in parallel (kinda). Each different leaf can be marked with an identifier, unique within each different tree of the ensemble. To simplify it: imagine an election system where each voter has the responsibility of electing a person from only a subset of the candidates that are close to their own field of expertise. For example, people with an economics degree will only be able to elect someone for the department of finance, and so on. The parliament, in this case the final suggested rental price, will be composed of all the candidates together.

Computing the final prediction is computing the sum of the weights contained in the leaves reached by the decision processes carried out in parallel.

The process of computing a price for a listing can be seen as assigning different leaves to each sample provided as input. Each trained tree focuses on different characteristics of a listing, describing separate aspects of the rental pricing domain, such as location and amenities. For more in-depth information on these models, refer to our article on an interesting way to use leaves.

A graphical representation of the price suggestion process.

Models performances in the housing market

Testing the performances of our models hasn’t been a walk in the park, either. Fluctuations of 50–100€, or even higher for more expensive apartments, are not unlikely. This is not ideal for computing performance metrics when the test data needs to be accurate. For this reason, while optimizing the models with R2 during training, we decided to validate the predictions with our experts on a weekly basis. They had the last say on our models, sometimes changing the ground truth after looking at the models’ suggestions or trying to understand why the models were underperforming in some areas. With their help, we were able to identify outliers in the training set, as well as features that weren’t contributing to describing the market dynamics.

In the end, the testing process of Metis (the codename of the technology on which the RentRadar was built) took over six months to develop. During this time, our Account Managers used it to monitor and, where needed, change the prices of the listings on our platform — they noticed how this tool can sometimes give much more accurate rental price suggestions than most experienced landlords. Today, it’s an invaluable technology that is helping us to improve the quality of our service. We designed the RentRadar to empower our users with the same state-of-the-art technology we’re using internally. By sharing both our knowledge and technical expertise, we are proud to help both landlords and tenants accurately price their rental accommodations, and in so doing, we are striving to make the rental market more transparent and easier to navigate.

We designed the RentRadar to empower our users with the same state-of-the-art technology we’re using internally. By sharing both our knowledge and technical expertise, we are proud to help both landlords and tenants accurately price their rental accommodations, and in so doing, we are striving to make the rental market more transparent and easier to navigate.

What’s on the radar?

RentRadar was released to the public a couple of weeks ago. The launch of this latest tool is another step to empower our users by providing them with the best possible technology to run their businesses or find a place on HousingAnywhere. We’re already designing other ways to use it in our product flows. Tenants viewing a property will be able to see the market price for listings similar to the one they’re interested in, thereby giving them a better understanding of the fairness of the price. On the landlord side, the tool suggests the best possible price to advertisers for a listing with similar features as theirs right before publishing it, therefore allowing them to verify the information they must fill in for their listing.

On a more technical level, while working hard to add more cities to the list, we’re continuously improving the tool performance and reliability, while researching options of employing more than one algorithm per city model. We know that with GBDTs, we cannot enforce some well-known characteristics of the domain. An example would be the direct correlation between some features: if all stays the same, adding another bedroom to an apartment will increase the rent. Though not very obvious, you would be able to achieve this with way simpler algorithms. Unfortunately, though, these are not as powerful as those that we’re currently using. Putting together a strong hybrid model will be beneficial for having improved alignment and the ability to represent our markets. Ultimately, creating a refined tool will give everyone the most robust answer to the question: what is a fair rental price for my city?

This post was co-edited by Gianluca Valentini (VP of Corporate Development).

--

--