Cutting cloud cost by 99.9% when serving 128 TB data and 5 billion requests

David Mosyan
2 min readJul 6, 2023

--

I was watching “NDC Conferences” talks and I really liked the one called “Have I Been Pwned: Serving billions of requests and terabytes of data without going broke! — Stefán” (the link to the video is at the end). It clearly shows how they progressed and solved challenges regarding traffic and cost, and I highly recommend watching the entire talk. So I want to summarize it and mention the cost optimization.

So what challenges they faced running a website which serves billions of requests? Well, the first thing is of course read operations from the data storage. The storage holds millions of text files and the number keeps increasing. Storing a huge data is not very expensive, but reading it will get quite expensive if you have billions of requests every month.

This is how much data they served in a month. So to summarize it was 5 billion requests and 126.84 TB data transfer and the estimated Blob Storage cost was over $12000. And the first thing I thought during the talk was why not cache the data? It’s static data, right? Yes, but the problem was not the latency but the served traffic, so they started using Cloudflare CDN.

It’s fast and cheap and brought the cost from 12k to $63. But they had another problem. The API allowed to download big zip files and the Cloudflare CDN allows to do that (up to 15 GB). But once they got a new password list from the FBI (225 million) the size of the biggest file (ZIP file combined from the small text files) started to be more than 15 GB.

The short solution was to disable big ZIP file download. But the long-term solution was just creating an open-source downloader for the big files.

To summarize what they did to reduce the cost

From the presentation

I highly recommend watching the entire presentation as it contains some other useful information.

Follow for more articles. Hope this helpful. Until next time :)

--

--