Building The World’s First IPFS Data Center — Part 1
Building a datacenter to specifically focus on providing high-quality IPFS infrastructure that truly separates reliance on third-party cloud providers has been an extreme challenge, but also an incredible learning experience. In this blog post I’ll be giving an overview as to how/why we designed the infrastructure, as well as some pointers that can be used in any situation to leverage a production capable IPFS system.
To arrive at the point of being ready to build out a production datacenter for IPFS, we under-went many different test scenarios, multiple different configurations. We were thankful to have a good 6-month chunk of time dedicated to being able to test our setups, although this may not be realistic for many people or organizations. We’re hoping that by sharing our experience, you’ll be able to leverage our work, and knowledge to make your IPFS experience a better one.
All versions of IPFS we used were the go-ipfs implementation. During our testing period IPFS went through 3 different version increases, starting at 0.4.16 ending at 0.4.18. This post won’t cover any particular version but will instead focus on trends that we noticed throughout the updates.
Our test environment consisted of three nodes in total, two located in our datacenter, one located off-site. All the nodes were joined together with IPFS Cluster to facilitate replication.
Picking the right servers in general is something that should be done with care, and IPFS is no exception. One of the key decisions is determining what components of the server you want to spend the most money on, and subsequently what components are the most critical.
During our testing phase, we noticed that above all, CPU processing capabilities was by far the biggest difference that would make our IPFS nodes sluggish, or as quick as lightning. However, CPU processing capabilities can be measured by three different components, the cycle limit of a core (Ghz), the number of cores, and the cache limit.
Initially we were pretty set on the cycle limit being the biggest differentiating factor, but as we started pushing peer count (1200 for the low watermark, 1500 for the high watermark) it became clear that the total active cores/threads would increase, as opposed to any single core/thread being put under significant load. Even when adding files or pinning content individual cores weren’t being pushed to their absolute limit, but the average load on each individual core would increase.
In terms of memory usage, we never noticed exceptionally high amounts of memory being taken up by IPFS. The actual memory usage was almost always within the same range, +/- 5GB depending on activity. When just running IPFS and IPFS Cluster we never had to increase the maximum memory limit to more than 32GB. The only reason we did, and why our current IPFS Nodes have 128GB of memory is because they also run the API endpoints for Temporal. Otherwise, we find it hard to imagine needing more than 64GB.
CPU under inactivity
CPU under pinning
As at the core, IPFS is a storage protocol how you implement your storage is a very crucial design factor. One of the big things we noticed with IPFS in the public network, is that speed can be very unpredictable. Within the datacenter, communication between the nodes is extremely fast, but unfortunately that won’t be where the bulk of IPFS traffic communicates across, and it will instead come from the internet.
While we did have one of our nodes datastore running on an SSD, we were never able to actually hit write speeds for which an SSD proved beneficial over traditional HDDs. This is largely due to the fact that when pinning requests come in you’ll be reaching out to the rest of the network for that data. Due to this, even when requesting popular content which resulted in significantly faster download times, we weren’t able to realize the true benefits of an SSD. Given this “gotcha” of the IPFS network, we determined it wasn’t practical to sacrifices performance in other areas, simply to have SSD based storage systems.
The one exception to this is in the case of direct file uploads, in which you can most definitely realize SSD speed benefits, however with the desire to support all functionality that IPFS has to offer, it was decided to omit SSD based storage systems initially. In the future however when we expand our datacenter, we will be adding a smaller capacity SSD storage system to facilitate faster file uploads. But for a foundation to build upon HDD based storage was the best option.
In terms of storage durability, and fault-tolerance there’s quite a few different options one can leverage in order to protect against suden hardware failure. In general it is safe to assume that while IPFS nodes will be written it, due to the decentralized nature of IPFS it is far more likely for nodes to be used to supply data which means ensuring data availability over long periods of time is hugely important.
To accommodate this, we settled on RAID6. While write-speed takes a huge hit, long-term durability and fault-tolerance experience significant improvements. RAID6 also on average gains about 14x improvement to read speeds, which is great for serving data to the IPFS network.
One of the other big issues with data storage fault tolerance methods, is that total storage capacity can be drastically reduced. While we initially considered RAID10, it would’ve resulted in massive reduction in total storage capacity. With RAID6, the more drives you have, the less your total storage capacity is reduced.
With our drive count (16) RAID6 merely resulted in a total 4TB capacity loss, while RAID10 resulted in a total 16TB capacity loss.
At the time of this blog post, the two main datastore types for IPFS are Badger, and Flatfs. While Badger is significantly faster, and better at handling large volumes than Flatfs, the main concern is that its relatively new in support, and not as stable. It has been known in the past that badger backed datastores for IPFS have become corrupted which is not a good thing. However, we also don’t want to sacrifice the amazing performance benefits gained by using badger.
To accommodate for the corruption risk, we leverage both Flatfs, and Badger. One of our nodes in the datacenter runs badger, while the other runs Flatfs. Our off-site node to facilitate better data availability in case our datacenter goes offline runs Flatfs as well in order to reduce chances for data corruption.
Going back to our decision to have a high amount of CPU processing, this directly ties into a non-default setting called HashOnRead. As per the config documentation, hash on read allows for the verification of blocks read from disk, at the expense of CPU processing, however by gaining the ability to have additional verification of the integrity of stored blocks, it’s absolutely worth it. When the documentation says it increases CPU load, it does by a lot.
While not a configuration setting, there is an IPFS command ipfs repo verify which allows for verification of all blocks stored. While we haven’t settled on a frequency to run this command, but when combined with HashOnRead we personally believe once a week is sufficient.
By using HashOnRead we already take a big performance hit, so anything that can be done to get back a bit of it is a huge thing. One of the biggest ways we found of speeding up pinning of data, was to enable the BloomFilterSize setting. The amount you need to have to achieve proper benefits will largely depend on the size of your IPFS repository. However, because this will also consume memory, you will want to make sure that you balance out system specifications with decent memory. Without a bloom filter, you can more than likely get by with 32GB of memory, as opposed to the 64GB we mentioned earlier.
Additional performance enhancements we’ve leveraged are through our customized software Temporal, and the redundant architecture allowing for concurrent processing of tasks, as well as leveraging two nodes to load-balance requests across. In the coming weeks we’ll be publishing another post detailing these optimizations, which we believe is the biggest reason why our nodes are as stable, performant, and reliable as they are.
A Thank you to everyone in the community
We here at RTrade Technologies would like to give thanks to everyone who has been helping and guiding us along the way and a special thank to the Juan Benet and the Protocol Labs team for the amazing job making all this even possible, as well as all the other open-source projects building out the IPFS ecosystem.
Our core values as a company is trust and transparency, we want to help educate this wonderful community and help progress this technology the right way. We are have learned so much, Temporal has grown up and matured and is finally ready for production launch in the coming weeks, being the first interface to offer all IPFS services.
Quality is something we strive for and this is starting to show through partnerships, companies utilizing Temporal and great user feedback from our API/interface testers. If you have any questions about IPFS or Temporal and all the great things it has to offer, please feel free to come say HI!