A hybrid approach to house valuations

A master thesis on combining machine learning and econometric modeling. Can we optimize accuracy while maintaining interpretability?

Published in

R&D Labs

4 min readMar 30, 2018

The role of property valuations

Property valuations play an important role in many applications. The Valuation of Immovable Property Act (Wet Waardering Onroerende Zaken, WOZ), requires annual valuation of properties, in order to use this value as input for local property taxes and national income and wealth taxes. Property valuation is also important for mortgage applications and for measuring the performance of a real estate portfolio.

Before a property is sold, we cannot know its price. And since the real estate market is not liquid, we cannot simply use past transaction prices. Hence automated valuation models using statistical methods to estimate the market value are often used for this task. These models use various property characteristics to estimate its market price. Besides being accurate, transparency is an important consideration for these models as well: the outputs should be fully explainable to tax payers or local tax authorities’ employees.

Comparison of automated valuation approaches

In my research, we have compared the effectiveness of three different automated valuation approaches.

The first is the traditional econometric modeling approach, which has been the dominant approach in house valuation for many years. The models of this approach typically have the advantage of being highly interpretable: the relation between the characteristics and the price are directly specified, and uncertainty in the predictions can be formally calculated. The model Ortec Finance currently employs is from this family of models: the so-called Hierarchical Trend Model (HTM) [1] is a state-space model where in addition to time-invariant mappings from characteristics, various stochastic trends are specified for different locational and house type categories to model the temporal nature of the house prices.
The second approach is machine learning. Namely, we have evaluated the effectiveness of neural networks (NN), random forests (RF) and gradient boosting (GB) on this task. These algorithms are typically quite accurate; yet they are `black-boxes’ since the complexities of their inner-mechanics render it impossible to get direct relations between inputs and outputs.
As a third approach, we have experimented with a hybrid model that tries to combine the advantages of both approaches. Specifically, we have proposed a method to replace the time-invariant part of the HTM with a machine-learning method.

We have trained our models using Dutch transaction data from 2009 to 2015, and forecasted the values of sold houses in 2016. The mean absolute percentage error (MAPE) performance metrics for in- and out-of-sample accuracy are given in the table on the right. Hybrid-NN and Hybrid-RF correspond to hybrid models where NN and RF algorithms are used to replace the time-invariant part of HTM, respectively. It can be seen that the Hybrid-NN model has obtained the highest out-of-sample accuracy. It is an important observation that using HTM and NN together offers greater accuracy than using either one of them alone; the same holds for Hybrid-RF as well.

As mentioned earlier, transparency is as important as accuracy for this task. The HTM is immensely better than black-box models NN, RF and GB in terms of interpretability. Certain insights can be derived from black-box models, such as variable importance metrics or partial dependence plots, but still the direct relation of the inputs to the outputs are lost. With the hybrid method, we have aimed to find a middle ground between the polar opposites: It loses the direct interpretability of the relation between the characteristics and the price, but still retains the nicely defined trend structures of the HTM.

Conclusion

To summarize the results, we have identified that there is potential in incorporating machine learning approaches to real estate value estimation models. The data-driven modeling capabilities brought by these algorithms can improve the accuracy over more traditional econometric models. Remarkably, we have seen that using a hybrid approach that uses econometric and machine-learning methods together obtained the highest accuracy. Yet, the problem of interpretability still poses a constraint. If a fully interpretable model is required, then an econometric model could be the best option.

References

[1] Marc Francke. The hierarchical trend model. Mass Appraisal Methods. An International Perspective for Property Valuers, pages 164–180, 2008.

[2] Maarten-Jan Evers. Machine learning het ultieme WOZ-waarderingsmodel (Dutch only).

Author: Cihan
Editor’s note: Cihan graduated with honors on July 18th at Erasmus University Rotterdam.