Building a Risk Machine, Part 2: At-Bay’s Cyber Risk Machine

Roy Mill
At-Bay
Published in
9 min readApr 26, 2018

--

At-Bay provides cyber insurance for the digital age.

Photo credit: Thomas Kvistholt

Understanding risk is the name of the game in insurance. When an insurance company differentiates between good and bad risks they can provide their best customers with lower prices while improving portfolio risk and profitability. That’s why we obsess over understanding cybersecurity risk.

At At-Bay our challenge is to build a system that can evaluate the cybersecurity risk of any company in the world. We call this system our risk machine — a combination of technology, process, and human judgment that receives a potential client as an input and produces an insurance quote (or a decision to decline) as an output. It needs to operate quickly, accurately, and economically. As it turns out, building a risk machine is a challenging and multi-disciplinary problem.

This is part two of a three-part series exploring the challenges of building our risk machine:

Part 1: What is a risk machine?
Part 2: At-Bay’s cyber risk machine
Part 3: Scaling our risk machine

In Part 1 we used car insurance as an example to discuss the basic challenge for providers of any insurance: how can you accurately and efficiently discern good from bad risk? But this shared challenge manifests itself in unique ways for those who build cyber insurance risk machines. Here we first outline 4 differences between cyber risks and traditional risks, then explain how we designed our risk machine to address them.

What makes cyber insurance different?

“Software is eating the world” — Marc Andreessen

Difference 1: software is new

There are decades of structured data on the history of car accidents that car insurance providers can use to model risk. With software however, there isn’t as much historical data as there is for driving. The widespread use of digital technology in households and businesses is a new phenomenon. For example, only about 10% of families had a cell phone or internet access in 1995, the same as the percentage of families who owned a car in 1915. While it is true that businesses adopted software before many households, it is still a very recent phenomenon and insurers have less historical data available to build accurate risk models.

Difference 2: software is complex and varied

Most companies do not develop their own cars, but many develop their own software or deeply customize the software they buy. Unlike physical infrastructure, software is easy to customize and configure in many different ways. Software comes in many shapes and forms, so insurers collecting data for their risk models must prepare for meeting many new setups in the wild. Insurance data collection modules need to handle more possibilities and the risk model that interprets them needs to take these variations into account.

Difference 3: software changes quickly

Not only is software new and complex, but people constantly adjust and reinvent it. Vendors release software continuously. Some releases upgrade old versions to fix bugs or patch vulnerabilities, others add new functionalities. While these upgrades may increase productivity and creative potential, they also create new opportunities for cyber attacks. Cyber risk is quickly evolving, and so should the cyber risk machine.

The speed of cyber change is actually compounded. New innovation increases the possible set of software versions, but the speed at which users download and install them increases the actual set of versions in operation.

Take your workforce’s mobile phones for example. When Apple or Google releases an update to their phone’s operating system, millions of devices worldwide get an “update available” notification within 24 hours. Soon hackers discover a security flaw in this new version, and the millions of devices that already updated their operating systems are now vulnerable to an attack that did not exist just a few hours ago. New hardware requires manufacturing and shipping. New software multiplies and travels at the speed of light.

Examples of recent software fixes for known vulnerabilities in mobile operating systems

Difference 4: software carries common risk factors

At its core, insurance pools risks that are potentially catastrophic to individuals into a bigger fund that has lower risk in aggregate. This gives those individuals peace of mind. But insurance companies can only survive when those risks are relatively uncorrelated. If risks are highly correlated, which happens when common risk factors affect multiple accounts at once, then the aggregate risk does not decrease.

For instance, imagine all residents of a neighborhood pay a premium to an insurance fund to protect their houses. When an electric fault causes someone’s home to burn down, her neighbors’ premiums pay for the claim. The insurance fund can manage the loss of any given fire, but each of those households would have failed to pay for it on their own.

This model breaks if all houses catch fire at the same time. If a nearby forest fire burns down all the houses in the neighborhood at once, then the insurance becomes meaningless because the premiums will not be able to cover the loss affecting all houses. To avoid this situation insurance companies pool risks from distant areas instead of neighboring houses. Insurance becomes less effective as common risk factors become more prominent in a portfolio.

Just as homes share common risks with their neighbors, so do companies share common risks with other companies who use the same technology. Common technology platforms like operating systems, infrastructure, networks, or even protocols can create correlated risks in a cyber insurance portfolio. An outage in an Amazon Web Services data center would bring down many companies at once. An iOS vulnerability would make countless organizations vulnerable at once. These correlated events are cyber “forest fires,” but unlike traditional risks they transcend physical distance. Cyber forest fires ignite across the world in an instant.

Designing a risk machine for cyber

We At-Bayers design, build, and constantly improve our risk machine to rise to the challenges of the digital economy. Here we lay out the design principles we used to build our risk machine.

Principle 1: cybersecurity expertise at the core

Even in the software world, cybersecurity is a specialty. Cyber insurance companies should have experts that specialize in cybersecurity and stay up to date on the latest in the field. For us this is a team of ex-hackers dedicated to cyber risk research. But cybersecurity expertise also extends beyond our risk teams. Many of our engineers and product managers have backgrounds in cyber.

You can only design a risk machine when you combine our cybersecurity expertise with other fundamental insurance and technology disciplines. Building tools to collect data requires engineering skills. Interpreting the distribution of that large representative set of data requires expertise in statistical analysis and data science. And of course, connecting these pieces from end-to-end requires thoughtful product design.

One example of an expert interdisciplinary cyber team working together, perhaps too eagerly

Principle 2: heuristics before statistics

Insurance relies on the statistics of financial loss which requires data. But as we mentioned before, the nature of software makes it is hard to find representative structured data that answers a cyber insurer’s biggest questions:

  • How do you measure a company’s level of cybersecurity?
  • How do you track it over time?
  • How do different IT setups predict the frequency and kinds of attacks they face?
  • How do those attacks affect financial loss?

Most insurance companies still use the parameters that they’ve always used for other lines of insurance like industry and revenue. But these old habits ignore how technology is different. We want to build the world’s most accurate cyber risk machine. Cyber risk does not have empirical answers based on a long history of standardized observations.

So we start with heuristics from our ex-hackers. Heuristics are basically educated guesses based on intuition. Over their years in the field of cyber intelligence, our team has developed an intuition about how systems work, how cyber criminals gather useful attack data, and what a makes a target company’s cybersecurity look good or bad. This intuition allows them to determine the parameters to translate technical findings into a quantified risk score. We implement these rules in the risk machine. These rules aren’t backed by rigorous empirical studies (yet). These rules are based on expert intuition.

Once heuristics-based rules are configured in the risk machine we can start underwriting and collect more data. With new data and additional statistical research we can then validate or replace our initial heuristics with empirically-validated rules. The ongoing iterative process of guessing, collecting data, and adjusting the model allows us to keep pace with technological change in a way that traditional insurance companies cannot.

Principle 3: ask machines, not (just) people

There is not a long history of standardized software data to inform our models, but the nature of software makes it easier to collect data about it at scale. We build probes that collect data on web, email, or other online services related to a company to a level of detail impossible for other insurance lines. This advantage is a prerequisite for the iterative process of refining our heuristics that we mentioned earlier. If you’re constantly changing the variables in the risk model, you must gather enough data quickly to verify if those new variables work.

There are some questions about cybersecurity that you cannot ask machines (we ask for those on our application). Whenever possible though, our risk machine gives more weight to data we collect from machines than data we collect from people. Machines have broader detail, and are more trustworthy for our purposes. This bias towards asking machines also has the lucky side effect of removing friction when applying for insurance, making our customers and brokers happy.

Principle 4: build for flexibility

Now we need a risk machine that empowers our experts to constantly change the model’s rules without breaking the machine.

This risk machine must be modular and flexible so that our cyber research team can quickly configure the machine when they collect a new kind of data, instead of building a new one. Both product and engineering teams address this challenge by generalizing problems. Instead of blindly implementing specific rules, we take today’s rules from the risk machine’s designers and build a machine that can handle similar rules like it tomorrow. The next iteration of the risk model would simply be a parameter change.

Think of parameterization as building a letterboard instead of a fixed sign. If your store is always going to sell the same thing its it’s cheaper and more straightforward just buy a fixed sign (it can also come in neon!). But if you plan on changing the message quite frequently it’s better to get a letterboard. This way you can quickly change the message without needing to replace the whole sign.

An example of problem generalization that allows for flexibility

This flexibility helps us account for software that is changing all the time, but it also serves another purpose — mitigating the the problem of common risk factors! Those scary cyber forest fires we discussed earlier can be addressed by a risk machine that is aware of forests and how many of our customers live close to it. Or in cyber, we need to design our risk machine so that when too many companies in our portfolio rely on one platform, our risk machine adjusts prices to attract new companies that don’t use that platform. This dynamic adjustment diversifies the common risk factors in our portfolio.

Scaling our cyber risk machine

The principles above help us create a more accurate cyber risk machine. But how can we make that machine cost-efficient? When it comes to building a real functioning risk machine, tough decisions have to be made. In Part 3: Scaling our risk machine we cover the challenges of building a cyber machine at scale, including many that we continue to struggle with every day.

Found this post useful? Kindly tap the 👏 button below and share the story to help others find it! :)

About The Author

Roy Mill is VP Product at at-bay. Likes data, hummus, and launching software solutions that make people’s lives better. You can connect with him on LinkedIn

Learn more at at-bay.com.

--

--

Roy Mill
At-Bay
Editor for

VP Product @ At-Bay. Likes data, hummus, and launching software solutions that make people's lives better. Email me at roy @ at-bay.com