Visualizing multidimensional data with Circle trees

Using real-estate data as a test case for how and why to use Circle trees

Alexander Skorniakov
Nightingale
7 min readMay 1, 2020

--

Data visualization may be an effective decision-making with many applications. In particular, it can help you to present multidimensional data in a way that is understandable. In this piece, I want to talk about one specific problem in the real estate world that practitioners encounter, and a solution which seems to be very interesting and promising: Circle trees!

Real estate developers, especially those involved in residential real estate on a relatively large scale, need to monitor sales of the dwelling stock they create very closely in order to adjust prices immediately — upwards for those properties that appear to be “washed out” (selling better) faster than others and downwards/freeze for those that are less attractive to potential buyers.

The lots that make up a residential building can be distinguished by a variety of features. Some of them are absolutely straightforward, like total area or living area in square meters, the number of rooms, and the floor on which it’s located. Others are more tricky to be assigned on a lot-by-lot basis on a massive scale and be converted into digital form, like view characteristics. There are some natural assumptions that you can make as to group all of the lots in a given residential building and to project which lots will “wash out” faster than others, but eventually you need to take into account your customers’ opinion on that (who vote with their bucks). This is where you come up with a problem of a compact visual representation of the lots in a given building based on their intrinsic features and also pricing and exposure period data and deals closed. Arc- of circle-shaped tree-like structures come as a natural choice here. Every level of such a tree represents a certain feature of the lots. Adding a new level, i.e. taking into consideration one more feature, or dimension of your data, expands the tree uniformly in all directions, so it stays rather compact (but more information-dense).

The deals in real estate are comparable to what airlines do in terms of tickets. As you know, some seats are better than others, and adjusting prices promptly so that an opportunity to buy a good seat is charged in fair respect to its demand, but still all of the seats are sold. This game is played all the time. The question revolves around money and companies — airlines or real-estate agencies want to “see” the market. So how do we represent real-estate lots in an overview while fitting the dynamics of demand?

Residential real estate developers seek various ways to visually represent value and demand dynamics of the lots in exposition to make its decisions on price adjustment as fact-based and shareable within its management team as possible. During my tenure as a head of project valuation group with a local developer in Moscow, one of the tasks of my group in respect to ongoing projects of the company was to regularly revalue its lots in exposition. This revaluation was generally very thoroughly discussed in a format of investment committee.

Limitations of current real-estate solutions

There was a solution that the company stuck to for a while. Each separate building within a given project was represented by an MS Excel table-like with its cells arranged in a way to mimic the building it represents (in respect to the number of floors and entrances), and each cell being highly loaded with numeric and color-based information on lots.

This solution may seem a bit obvious or simplistic, but it is absolutely practicable and your idea can be worked out to the smallest detail. To depict as much information as possible we used Excel plots that could be embedded in cells (sparklines) and conditional gradient fill patterns. I report below an example solution to give you a sense of how it all looks like.

Example published solution of the discussed problem. 6 buildings are represented, painted cells indicate sold lots, paint saturation and a numeric value within cells indicate percentage of cash received from buyers (as shown in the legend top right of the image). More visualizations of the same style can be found here.

This working solution arguably did not provide an intuition about the grouping of lots neither in terms of exposure, nor for the dynamics of demand and price adjustments.

Searching for a new design

I started to think about alternative visualizations that could give an image of different groups within our lots of interest. What seemed to me a natural fit were structures like phylogenetic trees, where nodes would represent different grouping rules and leaves. However, being drawn in a linear way say from left to right and with an increasing number of leaves, the right side becomes quite dense and unreadable.

Source: http://etetoolkit.org/gallery/

To avoid this, you can draw such a tree from a center and outwards with its leaves forming a circle or an arc.

Source: http://etetoolkit.org/docs/latest/tutorial/tutorial_drawing.html

After some more research, I found a handy Python library called ete3 which helped me generate arc- or circle-shaped hierarchical structures with pre-integrated color palettes.

You start by creating an empty tree object, then add layers recursively based on features selected for visualization. Each new layer corresponds to a feature, different branches of this layer to different values of the feature, and each leaf of the tree represents one of the lots in your building, or units of your data, generally speaking. When your’re done with the layers, you find yourself with several clustered units formed based on the units’ features. Picking this or that subset of features and their order to construct a tree, you can create different clusters and their visual representation. For more details on this library you may have a look at the complete ete3 documentation.

Playing with different subsets of features requires data. For each of the major on-going projects of our company we had a very detailed description of every lot (an apartment, an office or a commercial real estate unit) in every building within that project. This description is quite detailed! And this was basically stored as tables with rows representing individual lots with over 150 columns representing our features.

While relying on the aforementioned library, I created a tool that visually described lots within each of the projects. Its main use was to help identify clusters of lots that were selling well and those that weren’t, which would in turn justify subsequent price adjustment decisions.

Below is an example figure based on a mock dataset. You may find sectors painted according to a lot configuration (linear, swing, corner, three-way), branching is based on a lot configuration (the innermost level), view characteristics (in this example, cardinal directions are used, like S — south, NW — north-west etc.), a number of bedrooms, and the floor (the outermost level). Leaves of the tree represent individual lots (apartments) in the building. Green color indicates that the lot is sold, light green color highlights that is was sold recently (you can decide what “recent” means applied to your specific project). White color means these lots are still available for sale.

You can change a subset and an ordering of the features, for example, let’s make the number of bedrooms the innermost division criterion:

Or use only lot type and view characteristics as features:

I also report them side by side below:

In the tool that I’ve mentioned, you could also choose what features and in what order to use, to be able to look at the data from different angles. My GUI was a simple TkInter-based GUI, but of course any other library will do.

For you to be able to try it yourself, I’ve uploaded a Jupyter notebook with a code to generate images as shown above along with a sample dataset here: https://github.com/askorn1/ete3_practice

With many additional features within ete3 like, for example, adding charts to tree nodes, or using different colors for them, or adding sibling trees at master tree leaves, I believe it is possible to create rich visualizations with a real business value.

Final remarks

Although the focus of our group soon shifted heavily to analysis of new investment opportunities, I am eager to further develop such helpful visualizations and unlock the potential of data visualization for the real estate industry.

Many real estate companies have a regular need to assess and to adjust pricing of their stock. As far as I know, there is no tailored solution like the one I’ve proposed, although a lot of pricing decisions are of qualitative nature and not based on data. There’s here a clear opportunity!

Besides the real estate world, if you need to depict multidimensional or hierarchical data, consider using one or many circle trees visualizations, as they are compact in size while remaining information rich.

--

--