Decoding the Value of Noise: Integrating Noise Pollution Data into Real Estate Pricing with Openhouse

Published in

fromScratch Studio

7 min readJun 20, 2023

Openhouse (openhouse.gr) is a real estate platform which is currently under release by fromScratch Studio. Its distinguishing feature is based on a very intuitive, yet usually forgotten, fact that buying or renting a property essentially means buying or renting its neighborhood too. In today’s urban environment, properties, either commercial or residential, are far from being isolated. In a general sense, the surrounding environment affects the value of its properties directly or indirectly. For instance, you can think of a case where one house with X characteristics is located in the city center while another one with the same X characteristics is located somewhere in the countryside. Even though they might be structurally identical, their price evaluation would be radically different. This might seem as an extreme example, but we observe pretty similar cases in properties within the same city.

It should be clear by now how crucial it is not only to pinpoint the factors that influence the value of a property, but additionally quantify somehow the extent of that influence. Of course to do so, we need human expertise in areas such as urban planning, engineering, and software development. However, without the needed data there is not much to accomplish. That is why at fromScratch Studio when we say that we love data, we mean it. We think of data as a very valuable asset that if treated properly, it can have a tremendous impact on the end users. And in our case, treating data properly can potentially help someone somewhere find the house of his life.

In September of 2022, we’ve decided to put some effort into investigating how noise pollution acts on property prices. We already had a fairly large database of properties in the city of Thessaloniki Greece, where the platform is mainly active, and we needed the actual noise data. Needless to say that open datasets were nowhere to be found. We weren’t able to find even incomplete or partial datasets. If we think about it, this should be expected to some extent. Measuring noise pollution is far from trivial, since it requires specialized equipment and trained personnel. We can’t just scrap the public web for this.

Eventually, after some days of relentlessly searching the web, we were fortunate enough to find some official studies conducted by the Hellenic Ministry of Environment and Energy. These studies were focused in three different municipalities of Thessaloniki providing averages for both day and night noise pollution in highly detailed heatmaps. In some municipalities, they went a bit further by distinguishing the noise based on its source (road network, flying paths, etc.). The excitement didn’t last too long. It gradually started falling since we couldn’t find the actual noise measurements which was essentially what we needed. What can we do now?

After a brainstorming session, the engineering team changed the question from “what can we do” to “what can we do with these heatmaps?”. And that’s when we decided to embark on a difficult journey of extracting the noise from those heatmaps. It was an ambitious plan, but the idea of having such data at last, made it look beautiful. The most fundamental point we had to realize was the fact that we simply cannot extract the noise without losing information. Luckily, in our use case we don’t require extreme precision when it comes to noise. We care about how noise is actually perceived by humans, because that’s what matters when someone evaluates a neighborhood. So, our goal narrowed down to extracting the sense-of-noise which was fully plausible by our technical approach. As might be expected, things didn’t go entirely as planned. However, with some additional configuration and tuning of the extraction method, we ended up with a brand new sense-of-noise dataset for the area of Thessaloniki. Finally, we could start working on the fun part.

Reconstructed version of the average daily noise in Thessaloniki

The concept was to create a machine learning model capable of predicting housing prices. In the first case, the model will be trained with the features we already had. In the second case, the model will be trained with the same features plus the new noise index. By doing so, we will be able to practically compare the two models in terms of accuracy and investigate whether noise enhances or undermines the overall performance. On top of that, we wanted to shed some light into the impact of noise, as a feature, on prices. Technically, this is known as explainability and interpretability. In plain English, explainability can tell us which features (for instance the property size) are considered to be important in the process of predicting the target value (that is the price), while interpretability tries to measure the extent and the direction by which a feature affects the target value.

Which model should we train? We spend quite some time (aka weeks) researching the literature and the latest trends. Many methods and techniques were evaluated and discussed thoroughly. Long story short, we’ve decided to use Ensemble models because they’ve proved to work really well with real estate data. The next step was to build the desired MLOps infrastructure that will host our experiments. In order to be able to fully monitor and effectively oversee the entire training process, there must be strict flows, metrics and evaluation standards. The approach we’ve adopted is outlined in this Medium article.

On the very first run of experiments, we trained different kinds of Ensemble models with different configurations on the entire region of the 3 municipalities where we had noise data. Unfortunately, the results were quite bad. The model that incorporated the noise index was performing slightly worse than the baseline. The presence of noise as a feature not only didn’t help improve the predictions, but it made things even worse! We’ve started looking into the correlation between the price per square meter and the sense-of-noise to find that, actually, there is no correlation when we take into consideration all 3 municipalities as one undivided area. And, if we think about it, this makes total sense. Each region of a city has its own values, needs, goals and purposes.

Let’s suppose we have 2 properties where one is located in the city center and the other is located in the suburbs. We expect that noise will influence the 2 properties differently, since someone who wants to buy a house in the center will be more tolerant to noise pollution than someone who’s searching for houses in the quiet suburbs. Basically, one buyer sacrifices the quietness of the suburbs to have access to all the goodies a city center provides (public services, entertainment, commerciality, etc.), while the other does the exact opposite.

While having this in mind, we have selected 3 different regions based on their distance to the city center and the number of properties we had in each. The correlation between price per square meter and noise values was much clearer. We chose to train different models for each area and the results were far better when compared to the initial runs. In all areas, the presence of the noise feature was improving the performance of the models significantly. However, the most interesting finding was not the performance gain, but the way noise acts on prices in each region. More specifically, there were regions with diametrically opposite correlations. For instance, in the city center of Thessaloniki, regions with increased noise throughout the day have more expensive properties. Again, most probably this is due to the commerciality that describes the city center. In contrast, in the suburbs as we move to areas with increased noise pollution, the prices drop dramatically.

These correlations can be depicted by the following 2 partial dependence plots. The plots outline the dependence between the target value (price) and the average day noise in decibels.

Partial dependence plot for the city center (area A)

Partial dependence plot for the suburbs (area C)

So, noise pollution is a pretty peculiar feature. Contrary to, let’s say, the size of a property which almost always causes prices to rise as it increases and prices drop when it decreases, noise is not amenable to any general interpretation. It can have fundamentally different effects on properties in different areas in the same city. The essence of the word different encapsulates characteristics that outline the actual profile of the area. Usually, these are directly correlated to its commerciality, the services it provides (public or private) and its urban planning in general. The distinctive nature of noise makes it a bit difficult to work with. To be able to fully utilize its potential in a machine learning model, urban planning experts should also participate in the process by providing useful insights.

And this concludes the beginning of a long journey. We’ve started with an ambitious plan for incorporating noise pollution data into our platform, while having no such data. After some digging and extensive research, our engineering team managed to extract the data and build machine learning models capable of not only predicting housing prices, but also explaining the role of noise in the process. This research voyage has opened up new horizons on how noise pollution can be used to create more accurate and robust machine learning models, as well as how noise affects real estate prices.

And the best part is that all this knowledge will not just stay in a paper somewhere online, but it will be used practically to help users find the property they truly desire.

Special thanks to Prof. Grigorios Tsoumakas for providing guidance throughout the entire process.

Resources:

https://arxiv.org/abs/2302.13034

Decoding the Value of Noise: Integrating Noise Pollution Data into Real Estate Pricing with Openhouse

Resources:

Written by George Kamtziridis