2020, the unexpected Requirement

David Anderson
LibertyIT
Published in
4 min readOct 9, 2020

2020 has been a year of change. We have been talking about Serverless First for several years. How does a Serverless Well-Architected system absorb change? I would imagine quite well, but what about 2020-level change? Let’s explore what happened to some of our systems.

The unexpected behaviour.

When we build insurance systems, we are usually quite clear on the traffic. The cloud is a perfect platform if the demand changes rapidly, but we can generally predict when this will happen. We understand the ebb and flow of our industry, and we also deal with large events quite well, we deal in risk, and we mitigate appropriately. But what happens when something like COVID comes along and completely changes behaviour across the world?

The rise and fall of traffic due to COVID lockdown.

In two of our systems, we observed two unexpected behaviours this year that we never predicted. Thankfully, both these systems are Serverless, Well-Architected cloud systems.

Why is that important?

It’s important because they are incredibly resilient and flexible. The traffic patterns observed have never been seen before.

High Demand reduces.

The critical function in insurance is claims — period. At Liberty Mutual, being there for people when they need us most is what we’ve been doing for more than a century.

It’s the time when we need to help and support people. We will always endeavour to improve the experience. Every call, message and click is important, so we build for high capacity, and we don’t mind paying for that.

One of our systems is a Virtual Assistant that we use to answer some queries immediately (it uses NLP to have a conversation with callers and provide realtime information — learn more at “Bring the Power of AI to your Amazon Connect Contact Center”). Queries like when will I get my payment or where do I pick up my rental car — they are simple but significant. We don’t want to keep people on hold. This is a high throughput system.

When lockdown started, these queries reduced dramatically — as of August 2020, we observe a 74% reduction in volume. The team that built this system took a Serverless First approach and performed many Well-Architected reviews. They knew the bottlenecks and built flexibility and resiliency into the system from day one. The primary concern was always “can this system take the load”. We spent three years tuning and hardening the system so it would perform at scale — potentially during a major catastrophe. When lockdown started, and the calls reduced, our system scaled down and our bill also reduced. The system still performed flawlessly — it was just smaller.

Low Demand increases.

A second system of note is one of our Outgoing Payment Hubs. We have many scheduled payments to external parties — usually recurring payments that are prepared well in advance and low volume. It’s more important that these payments are accurate and secure. The team that built this system took a Serverless First approach and automated lots of the compliance, security and auditing for payments. They designed the system to scale over the next few years gradually and used Well-Architected from the start.

Imagine the surprise the team had when they logged on one morning in early April and had an email saying (I’ll paraphrase):

We have just announced that we are refunding 100’s of millions of dollars, and we are pushing the payments out through the system ASAP”.

No time to rearchitect, add capacity or get expert advice. The team knew that they had been through Well-Architected and they knew Serverless scales. They checked some limits, and triple checked a few settings — maybe a few hour’s work and no significant changes required. The system then scaled to handle a volume of traffic that no-one had planned for. In Q2 in 2020, we pushed 500,000 additional payments through the system due to COVID, and everything went through without incident. At times like these, when people are waiting for essential payments, your systems can’t fail.

Great teams predict the unexpected.

The teams that worked on these systems are very disciplined, skilled and conscientious engineers. They decided to build Serverless as they knew they need high levels of security, performance, scale, flexibility and resiliency — not because it was cool. No-one told them to use the Well-Architected Review process (we use the AWS framework, but we have changed the process to work for our teams). They are continually searching for ways to be the best they can be, so they made sure they were first when we introduced our Well-Architected approach.

AWS Well-Architected Framework

When teams are full of great engineers that understand the business domain, understand their user’s needs, and they know their responsibility to their policyholders — they do the right thing. No Architect wrote a Non-Functional Requirements (NFR) document, but the teams knew the levels of quality they had to achieve. Neither did these systems take five years to build, both had MVP versions put into production quickly, so the teams had to operate against the usual time pressures — no time for gold-plating. The cloud provider was able to provide a lot of NFR and features as part of the Serverless ecosystem.

Responding to change.

There are a lot of books and frameworks that speak highly of responding to change. Building software systems is always challenging; it’s very reassuring to see that we could be there for our policyholders when needed. It’s not about Serverless, and it’s about building the right system, the right way and having the confidence to know when it’s good enough.

Great engineering is as much about confidence and leadership, as anything else.

--

--

David Anderson
LibertyIT

Senior tech head. Interested in Cloud, engineering, building high performing teams and sharing my stories.