An on-call developer’s worst nightmare (red indicates errors)

How I Scaled a Software System’s Performance By 35,000%

Dive into how I resolved a platform’s scaling, stability, and performance issues through caching, jobification, queue separation, and more.

Joseph Gefroh
Published in
21 min readJul 22, 2020

--

Processing over $20,000,000 in a single day

A previous company built payments systems and giving day software intended for massive giving days where we would receive tens of thousands of donations for a single campaign.

One of my responsibilities at that company was to scale the system and ensure it didn’t topple over. At its worst, it would crash on just 3–5 requests per second.

Due to a inefficient architectures, questionable technology choices, and rushed development, it had many constraints and was a patchwork of band-aids and gaping performance gaps. A combination of magical spells and incantations would keep the server running throughout the day.

By the time I was done with the platform, it had the potential to manage several thousand requests per second and run thousands of campaigns simultaneously, all for roughly the same operational cost.

How? I’ll tell you!

Analyzing the usage patterns

--

--

Joseph Gefroh

VP of Product and Engineering @ HealthSherpa. Opinions my own. Moved to Substack. https://jgefroh.substack.com/