Using Machine Learning to find value zones in São Paulo

Guilherme Marmerola
Loft
Published in
5 min readJun 10, 2021

--

How to split the city in such a way as to have value zones with the least dispersion of price per meter? Different colors indicate different value zones in the city.

Here at Loft, we are continually building our understanding of the real estate market, bringing data and the experience of our partners and founders together. We like to exercise our creativity by exploring and adapting existing concepts with data and algorithms.

One of the most established concepts in the real estate market is that of the neighbourhood. Each person has a different relationship with this concept: some people care little about which neighbourhood they live in, as long as it is close to work; others were born, grew up and created roots in the same neighbourhood, or identify with its lifestyle.

At Loft, in addition to identifying which neighbourhoods meet the different profiles of our clients, we are interested in understanding neighbourhoods by more functional aspects, such as size, population density, presence of green areas, and more. In particular, we were interested in knowing where the pockets of real estate value are located in the city. That is, to find out how to divide the city into parts such that property prices in each part are very similar.

Thus, we exercise our understanding of the market by creating a new value-focused neighbourhood concept, comparing new neighbourhood lines with existing ones.

Simplicity and effectiveness: decision trees

A decision tree is a machine learning algorithm that learns through rule induction. It partitions the training dataset aiming at reducing the dispersion of its target variable. In our case of neighbourhood partitioning, this algorithm fits very well, because we want to do just that: divide the city in partitions (splits in latitude and longitude data) such that homes in these partitions have low dispersion of price per square meter.

In our case, we use a decision tree to divide the city aiming to find partitions with low price dispersion through latitude and longitude rules.

In a nutshell, we arrive at the result as follows:

  • We prepared a database of real estate listings that contains estimates of price per meter of flats in 2018. At Loft, in addition to the listing database, we use several other data sources for pricing, including more up-to-date and accurate sources such as registrations and transactions on our marketplace. For the purposes of our exercise, however, the data collected is sufficient.
  • Each row in our database is a building, containing the median price per metre of the apartments advertised in the building:
  • We apply a rotation transformation to the latitude and longitude data. This way the tree can make diagonal splits.
  • We fit the tree on the rotated lat-longs, with the constraint that we must have at least 200 buildings per partition.
  • We publish the result using kepler.gl, which allows us to see the partitions and the average price per metre of the partition.

Results

Below we publish the map with the results, where you can switch between the layer of clusters (partitions) and average price per cluster. It’s worth mentioning again that we used a listings data source from 2018. Therefore, the prices on this map may not accurately reflect current market prices. Nevertheless, that does not defeat the purpose of using this database for making comparisons between neighbourhoods.

Example: Jardins

Each person has a slightly different definition, but it is common to refer to the neighbourhoods between Avenida Paulista, Rua Estados Unidos, Avenida Rebouças and Avenida Brigadeiro Luiz Antônio as “Jardins”. Loft was born in these neighbourhoods. Our first office and first apartments were in this area. We are therefore very fond of this region and have a lot of experience operating there. We generally evaluate the behaviour of our algorithms in Jardins to see if they “make sense”.

It is common to divide Jardins into two different neighbourhoods:

  • Jardim América, between Avenida Rebouças and Avenida Nove de Julho
  • Jardim Paulista, between Avenida Nove de Julho and Avenida Brigadeiro Luiz Antônio

The algorithm, on the other hand, divided Jardins into 4 different partitions, dividing both Jardim América and Jardim Paulista in two (the central line runs through Alameda Casa Branca, almost at the division of the two neighbourhoods):

Partitioning Jardins into 4 value pockets. Different colours indicate different partitions.

When we change the visualisation to the price per metre of each of the partitions, we see a pattern of higher valuation in the Jardim América region and in the vicinity of Rua Estados Unidos, a fact that had been heard in the market a lot, and has now been discovered and confirmed through data :).

Distribution of prices in the partitions of Jardins. Darker reds represent higher prices, while yellow and blue tones represent intermediate prices.

Conclusion

Finally, we show this last visualisation of the average price per partition in São Paulo. We see several interesting patterns: (a) Vila Nova Conceição and a part of Moema coming together into a single zone; (b) different partitions in Higienópolis; (c) price increases around Ibirapuera Park; (d) a mini-high-value zone between Consolação and Bela Vista, and others.

Distribution of prices in São Paulo. Darker reds represent higher prices, yellow tones represent intermediate prices and dark blues represent lower prices.

Feel free to explore the map we have provided above and create your understanding of the market together with us!

We continue to use data science to address the complexity of the housing market, reinvent the consumption of housing, and eliminate, for our clients, the friction in the process of switching homes.

Do you want to join Loft and build the future of real estate?

Apply for our open positions! — https://jobs.lever.co/loft/

--

--

Guilherme Marmerola
Loft
Editor for

Data Science Manager @ Loft. Passionate about how data science empowers us to solve hard problems in nearly every industry.