
Member-only story
How I Scaled a Software System’s Performance By 35,000%
Dive into how I resolved a platform’s scaling, stability, and performance issues through caching, jobification, queue separation, and more.
Processing over $20,000,000 in a single day
A previous company built payments systems and giving day software intended for massive giving days where we would receive tens of thousands of donations for a single campaign.
One of my responsibilities at that company was to scale the system and ensure it didn’t topple over. At its worst, it would crash on just 3–5 requests per second.
Due to a inefficient architectures, questionable technology choices, and rushed development, it had many constraints and was a patchwork of band-aids and gaping performance gaps. A combination of magical spells and incantations would keep the server running throughout the day.
By the time I was done with the platform, it had the potential to manage several thousand requests per second and run thousands of campaigns simultaneously, all for roughly the same operational cost.
How? I’ll tell you!
Analyzing the usage patterns
Before we dive into how I optimized this system, we have to understand its usage patterns and the specific circumstances and constraints which we are trying to optimize under — to do otherwise would be to shoot in the dark.
Giving days have defined starts and stops

Giving days are massive planned events, scheduled months in advance. They start and stop at very specific dates and times. Sometimes these dates are moveable. Other times it is not.
There’s an emphasis on sharing
During the campaign, the effort to get the word out to donate can be intense.
Our system might send out hundreds of thousands of emails at the very beginning of the day, with…