Holistic view on real estate location: insurance focus

Anna Amvrosova
7 min readSep 16, 2022


Correlations we studied in the equipment breakdown loss prediction model using the historical data sample of 2019–2020 show the following % of prediction improvements: the total predicted loss error was reduced by 15p.p with the addition of Habidatum mobility metrics alone and as much as 27 p.p. with the addition of all Habidatum metrics, including Location Risk Score, compared to the baseline model.

Insurance is about future risks and losses. It is a business of preparedness based on hints found via data, statistical simulations, estimates and probabilities. There is no other way to sell coverage for future losses than first simulate them, and then price them accordingly.

Most of the fulfilled risks, which previously led to real losses, have left their footprints, both directly and figuratively. Inhabited space, its characteristics, features and patterns represent the great field of risk evidence, no matter whether they have already been fulfilled, or are yet waiting for their turn to strike.

Location data is critical to estimate claims exposure and elaborate relevant policy pricing. Various types of insured risks relate to people’s mobility, concentrations, dwell time at real estate property and around it, along with property accessibility patterns.

Many property and liability risks can be explained through the world of real estate, as it hosts and anchors people’s social and business activities, becoming their focal points. However, property functional profiles, both current and ideal, and most of the risks associated with them, are the derivatives of catchment (influence) areas around those properties.

Real estate does not move, but everything around does, and more and more so. Real estate location, meaning an area around it inhabited by people and businesses, is a significant factor in commercial real estate performance, value formation, and the associated risks.

Location is not under the control of property owners, although it represents critical risks and opportunities for owners, creditors and investors in real estate. Location is an area, not the point. The area is hard to grasp and evaluate. It is especially difficult when there are many of them, and when it comes to massive property portfolios, comprising thousands of locations.

While a single property owner may know the neighborhood well, the actual area that shapes demand and risks for the property reaches far beyond the neighborhood, being formed by accessibility, travel time and transportation modes. Real estate property spatial context — transport connections and critical services, trade areas and their active population, social activity and commercial diversity — matters as the key factor of property value and resilience.

Moreover, when it comes to thousands or hundreds of thousands of properties in a portfolio, one needs an algorithm to pack various indicators of location potential and risk into several aggregated metrics, built independently to be processed quickly for modelling and operational use.

As far as insurance is concerned, one can obtain from location data such information as the speed of first responder reaction to emergencies, time for recovery, lost business income due to insured casualties and many more.

In this brief post, we’d like to present our experience in applying location data to loss prediction models that we experimented on with several national and international insurance companies.

Location is not where you are, it’s where you are not

The first thing we’re invited to define and measure is the influence areas of real estate properties. These are areas from where the emergency services (fire, repair) and first responders will be called in case something happens with the property, and from where its visitors usually come. This is relevant for insurance covering damage both produced by external forces (like fire) and by internal causes (like equipment breakdown).

Transportation options are important to make the property accessible even if some of the connections are blocked for some reason. Travel time to the destination at risk is a proxy of possible damage property can be exposed to before the emergency services arrive. The amount of population around the property, its mobility patterns and socio-demographics define the location’s economic profile and help predict associated income losses in case of business interruption.

Our platform provides both absolute accessibility metrics, such as ‘time to reach the property’, or ‘number of emergency services around’, and relative ones — property type-specific Location Risk Score that allows comparisons of each and every location in the country by its commercial strength and associated risks.

During the evaluation process, we also validate property characteristics:

  • Its geofence is needed to map accessibility correctly, as hyperlocal precision is critical in case we speak about whether or not a fire truck can reach the building in time.
  • Its up-to-date function in a standard form (NAICS or SIC codes-matched) is important for building a relevant Location Risk Score, compatible with Bureau of Census classification — it is always asset type-specific.

The functional profile of properties and activity inside them attract much attention, so there is a special bundle of metrics we built (described below).

An example of influence area: home locations of a property visitors (by Census block groups) and 10-minute drive time area (magenta line)

People and businesses moving in and out

Property-specific metrics, as long as they can be built based on data originating from similar sources, are easy to produce. There is also much interest in them and their influence on prediction models. The simplest questions to answer include:

  • Is the property in use, or abandoned?
  • How many attendants are there on average per day, per month?
  • How is the attendance changing over time? Is it stable or volatile?

As an example, property attendance and vacancy ratios are often required by IoT teams in order to figure out proper information on water consumption and leakage. In this case, the least and the most occupied buildings are of the highest interest.

Some caveats to keep in mind. In the case of a multi-storey building, the ask could be around counting people on a specific floor. The identification of height via available mobility data sources is problematic and not precise enough for an accurate count. Measurement of people occupancy within the whole building perimeter can be pretty sharp though.

We’re working both with commercial buildings and multi-family complexes. Single-family houses are not in the lens in terms of attendance, as this falls out of personal data protection and privacy standards. However, all accessibility and functional diversity patterns around single-family houses are covered.

Also, as the data source panel representing mobile devices generating location data is constantly changing over time, it is hard to say from the initial dataset if the attendance of the property has really fallen, or if it is the data source quality issue. The best providers on the market work hard to solve that, but there’s still a need to keep maintaining data quality checks in terms of temporal consistency and spatiotemporal comparability, especially if we speak about hyperlocal high-precision metrics on building or small grid cell level (like 50 x 50 m).

An example of strong mobility change: mobility numbers at the property location have increased 4x since 2019, as a result of an addition of North Store Farms in summer 2020; there was a drop in visitor activity in early fall 2021 due to a flood when major anchors including CVS were closed. No data quality issues were observed in this case.

We often see shifts in building attendance/occupancy being indicative of changes in tenants' functional profiles. The information on tenants in the building may sometimes be inaccurate though, or not up to date. This is where mobility data plays a role of an additional checking tool, coming from a different side. Spikes may mean a new tenant moves in, and decline — temporary or permanent closure of tenant operations. We train financial models on retail performance data and continue working to expand the same approach to other segments of commercial property.

The growing interest in location data “ingredients” and raw data has been slowed down by many above-mentioned caveats, but not stopped, of course. Data quality progresses immensely and paradoxically the better the quality of raw data gets achieved, the less interest we see in silo data ingredients.

Most importantly, it is not raw data and silo ingredients, but smart aggregates that make a real change in understanding insurance risk, providing a holistic view of the location as a risk and an opportunity.

Smart amalgamated data metrics, playing the role of risk benchmarks, become the new normal in modern data-driven finance. They dramatically improve loss prediction and already tell insurers and bankers how to mitigate risks, not just what to learn before getting the ability to mitigate.



Anna Amvrosova

Urban researcher and data analyst, connecting physical spaces with digital world. Passionate about cities, in love with the ocean