How to store petabytes of data economically

Dick Tang
Dick Tang
Published in
2 min readNov 3, 2018

How do you store your precious image and video? Very likely you will buy a (or two) harddisks, attaching it to your computer (either via SATA / eSATA / USB3 /etc) and formatting them before draping the files into it. But it would be a challenge problem if you want to have a shard storage among teammate. Still you can buy a NAS and enable SMB / NFS sharing. But it is still not scaleable.

If you are familiar with AWS, the first item you can think of possibly is AWS S3 (Simple Storage Service). Dropbox leverage on this cloud object storage service until 2017 [1]. It costs you around ~$0.02/GB/month (or ~$0.01 if you pick Infrequent-Access class). AWS even provide AWS Storage Gateway as a hybrid storage solution. Google Cloud and Microsoft Azure offer similar service with similar (or slight lower) price.

If your data is more mainly on archival (very cold one, and takes hours before you can retrieve data), you may pick Amazon Glacier, which offer a much better price of ~$0.004/GB/month if you tolerate it). Remember, there are some trick on the pricing model when you want to retrieve data, since it almost assume you won’t retrieve (or only retrieve a small portion of it)

There are also some provider like OVH / Scaleway, to offer a cloud object storage service with ~$0.01/GB/month [5] [6].

If you want to avoid cloud storage, then you may think to buy a server from either Dell/EMC/NetApps/etc. Maybe you do not need to do it in that traditional way — a backup service startup also encountered this problem before and they decide to build their own storage server: Backblaze Storage Pod [2] [3]. They even release the detailed design for you to leveraging on them. It is much cheaper than the solution from those storage vendors.

The cost comparison among solutions (year 2009), https://www.backblaze.com/blog/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/

and they later productize and turned it into another product: Backblaze B2 Cloud Storage ($0.005/GB/month). There are several integration including Minio S3 Gateway, which turn B2 into a S3 protocol compatible.

Update:

There are new challenging provider, a new company called wasabi [8], offer a very cheap price of $0.0049/GB/month and no extra network transfer cost for data retrieval. Sounds it is quite promising.

Update 2:

OVH also provides a bare metal plan STOR-24T with 4x 6TB HDD (effectively 18TB under RAID5). It gives $0.006/GB/month.

Summary

Nowadays, there are many solutions to handle petabytes of data (esp. cloud solution). Never always handle it like: “Let’s buy hundreds of servers to keep those data”.

Reference

[1] https://techcrunch.com/2017/09/15/why-dropbox-decided-to-drop-aws-and-build-its-own-infrastructure-and-network/

[2] https://www.backblaze.com/blog/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/

[3] https://www.backblaze.com/b2/storage-pod.html

[4] https://www.backblaze.com/blog/2012/06/06/netflix-you-flatter-us/

[5] https://www.ovh.ie/public-cloud/storage/object-storage/

[6] https://www.scaleway.com/object-storage/

[7] https://www.backblaze.com/b2/cloud-storage.html

[8] https://wasabi.com/

[9] https://www.ovh.com/sg/dedicated-servers/storage/1801fs07.xml

--

--

Dick Tang
Dick Tang

Director of DevOps Engineering @HK01 . Former @9GAG SRE Engineer . OSS advocator