One Love, three days: how British Red Cross handled digital donations from a live event
At Friday, we pride ourselves on building digital services that perform at scale. That claim was put to the ultimate test during One Love Manchester, the event organised to raise funds for those impacted by the bombing a few weeks earlier.
We’d built a new donations platform for British Red Cross (BRC) a few weeks previously. Our involvement in One Love began at 1.30pm, Thursday 1 June, when BRC told us they were partnering with BBC and One Love Manchester, scheduled for that coming Sunday evening.
What they now wanted to know was what would it take for the platform to receive donations from across the world at a huge, live public event. Could we, in 3 days, scale a platform, specified to cope with the expected peaks and troughs of BRC’s UK-based campaigns, and make it capable of handling huge waves of traffic over an extended time from the 43 countries to which the concert would be broadcast?
Yes, we could — and did. The donations platform was architected precisely so it could scale. British Red Cross raised £2.35 million in three hours at an event watched live by 22.6 million people.
As soon as BRC put us on notice, we assembled a team and began building a network operations centre (NOC) in our London office to support the donations platform in the run up to One Love, during the event, and afterwards.
The platform had already been tested to exceed the peaks predicted for BRC’s UK campaigns, but now we would need to test its capacity to an order of magnitude higher.
The team’s job was to operationalise the NOC: set up roles and responsibilities, establish communications channels, and model what-if scenarios, such as: technical failure, payment gateway failure, operational failure, and a critical incident in or around the event.
We extended real-time analytics into every dimension and level of the platform, such as load performance, users, payments, response time and error rates.
We had to be ready for crazy response rates, so testing was pushed to exceed by 2x to 3x the traffic volume expected from the event. We tweaked performance and monitoring until we could sustain up to 180,000 transactions/minute.
The testing was finished by 7pm Friday evening, so we took Saturday off, reconvening at the NOC at 4pm on Sunday. One Love started at 7pm.
Our ops team monitored performance, error rates and loading; our devs worked alongside them in case the platform needed changing on the fly; and a release management team was on hand to provide lightweight QA to ensure changes — if required — could be deployed seamlessly.
We weren’t disappointed. Ten minutes into the concert a downstream service failed.
We had to invoke one of the contingency scenarios and introduce platform changes. A patch was identified, modified and released in 20min with no interruption to the donations platform, entirely invisible to users.
Scaling the peaks
Traffic spiked as predicted around specific calls to action, such as when Imogen Heap and then Justin Bieber were onstage. The URL redcross.org/love was broadcast on massive screens and redirected to the donations platform, sending already high traffic screaming upwards.
At peak we processed 398,000 requests a minute. Users saw an average response time of 0.1s whether they were in the UK or the other side of the world.
During the 3hr of One Love, nearly 700,000 users hit BRC’s site, a concurrent maximum of 33,000 users making 823 donations a minute — higher than register-to-vote users a few hours before the 22 May deadline for the last General Election.
During the weekend the website handled more than 1.8 million users, compared to 24,000 the previous weekend. 55% of users were from the USA, 35% from the UK; 94% were using mobiles and the mobile conversion rate jumped by 15%.
The high proportion of giving via mobile is significant: historically BRC had weak mobile capability, and the donation experience had not been responsive.
One Love ended at 10pm and by 11.30pm traffic spikes abated, so we left the platform running unattended.
Nevertheless, traffic remained high: 5,000 people were on BRC’s site 8.30am the next day. Usually it’s about 100 on a Monday morning.
Above and beyond
We architected the donations platform to scale up way beyond the peaks of traffic usually expected from BRC’s campaigns. You have to do this when the purpose of the project is to handle, at short notice, things outside your control, but where you are a critical part of the response.
Using Amazon Web Services, the costs for scaling occur only once you scale up, and are gone as soon as you scale down. If you architect the solution correctly — to cope with crazy — then you don’t need idle spare capacity wasting money.
When it comes to monitoring, web analytics aren’t good enough. You need low-level real-time insight into every part of the platform, and the ability to aggregate up to multiple dimensions of data that can be interpreted and acted on in real time by an empowered and capable team.
We’ve proved that we can cope with pretty much any requirement for business-critical customer-facing platforms needing transaction processing, high-volume reporting or real-world event or incident support. For example: fundraising, public events, emergency response, or travel disruption.
It’s not just technology; it’s the people and their passion for doing it right; it’s about lightweight but robust processes; and it’s about giving people the authority to act in real time.
It’s not rocket science; it’s attitudinal.
This piece was originally posted on the Friday blog, where you can find all of our latest thoughts: https://www.wearefriday.com/thinking