Data: it’s not how big it is, but what you do with it

Written by Nils Kok

Big data” has quickly emerged from obscurity into a buzzword, rising from non-existent in 2011 Google Trends numbers, to peak popularity in 2016. Indeed, peak popularity of “big data” was last year.

In 2017, the world no longer cares so much about how big data is, but rather, what you can do with it. And as I learned since I started at GeoPhy 3 months ago, the possibilities are endless. The goal of the GeoPhy blog is to give the real estate industry, and basically all other industries that use buildings or their surroundings (I’m still thinking of an industry that does not fit this description…), insight into some of the applications that boundless swaths of data provides. Because without application, data is like raw oil without the refinery.

When we talk about data on buildings and their environment, it all starts with the GeoPhy data lake that is filled with about 100 million buildings. Each building has a basic set of characteristics (or: features), such as size, year of construction and renovation, use types (that’s plural, as buildings can have multiple parts with different functions), etc. There are financial details, such as the rent roll, occupancy rate, the last sales price, liens on the building (i.e. a mortgage). And importantly, each building resides in a certain area. That area is different and unique for every building — distance to amenities such as bars, restaurants, events, gyms, (public) transport, and the density of the “catchment,” measured by number of people of age cohorts, income cohorts, etc. At a more macro level, there are many more data layers: economics, demographics and risks, including jobs and job growth, income growth, inflation, flooding, population growth, etc. All of this is “raw” data, from thousands of different sources, both public, semi-public and private. These rivers of data flow into the data lake directly, through automated connections (APIs), are imported through spreadsheets, or are harvested from PDFs and webpages through natural language processing (NLP) and crawlers.

For many of GeoPhy’s users, there is already tremendous value in the raw data, but for others our refinery is needed. Our core information products coming out the data lake are standardized metrics that can be used in the evaluation of buildings and areas: quality, value, and risk. The QualityScore is a 0–100 measure that reflects that quality of an individual building and its surrounding area, relative to buildings in the same property type (each property type has a unique scoring algorithm). Under the quality umbrella, we also have a carbon footprint for each building in the data lake (see our recent blog on carbon footprinting), and a GreenScore that reflects the presence, depth and quality of sustainability characteristics. Value is based on the GeoPhy AVM, an automated valuation model that predicts a value for every building in the database. This is a bit like an appraiser assessing the value of a building, but rather than walking around each of the 105 million buildings and making subjective judgements, we use the power of data in combination with smart, modern machine-learning modeling techniques. The RiskScore reflects the extent to which an asset and its area are exposed to climate risks, such as flooding (see our flood risk dashboard developed for Rijkswaterstaat, the Dutch agency responsible for 2,700km of dykes), extreme weather events, etc.

Now, for clients to consume this data, we would love to simply feed them information on the buildings and areas that they’re interested in, whether it is for the evaluation of a new location for a retail store, a logistics facility for the distribution of your packages from Amazon (etc), investment in an office building, development of student housing, etc. That feed would take the form of an API, and we would be done. But the reality is that “consuming an API” is not standard practice yet, and many clients, both companies and governments, do not have the in-house capability to ingest data through an API, and neither the visualization and/or analytical capacity that is needed to ultimately use the information. That’s where GeoPhy turned from data company into part-data, part-software company — we developed a series of dashboard that deliver the data in a user-friendly format. The GeoPhy Dashboard to access information on individual buildings, the Alpha platform to access portfolios of buildings, and the Metropolis tool to access data layers such as catchment, proximity to public transport, flood risk, etc. And of course, it does not end there: we also perform data analysis, explaining the effect of airport proximity on office rents, the effect of the government as a tenant on property values, the effects of bars/events/arts venues of rental growth, the effect of Brexit on the London office market, etc.

Whether it is a QualityScore or the automated valuation of real estate, all examples of application of data in combination with analytics is a story in itself. Don’t despair — these stories will come to you, one by one, week by week. So, stay tuned, and recall, next time you read about “big data” — it’s not how big (data) is, but what you do with it.