The Black Art of Synchronizing Ethereum Nodes
Not too long ago, I suddenly found myself attempting to put together a plausible distributed application (dApp) for the Ethereum blockchain in time for the CryptoInvestSummit last October, with only a month to prepare. There are few things that motivate me more than the fear of public humiliation, as such I was able to put together a white paper I was proud of (something better than a high tech variation on a penny stock scam). This turned out to be the easy part.
I have done some time in financial industry corporate IT, so I have some familiarity with high throughput online transaction processing databases. Years earlier, I had developed a Bitcoin payment system, and even back then ZFS seemed like it was perfect for blockchains as it allows a large amount of data to be stored cheaply in redundant consumer grade disks, while the parts of the blockchain that are in use can be stored using faster solid state storage.
I had long used a home hosted Bitcoin node because the expense of colocation did not come close to the handful of payments I ever received in Bitcoin. My customers seemed to prefer their credit cards overwhelmingly.
My first successful attempt at building a Ethereum node (one that would synchronize) consisted of an i5–3570K Ivy Bridge from 2012 which was still barely holding on in 2018, a 256 GB Samsung EVO SATA3 SSD, and 4 Green 2TB Hard Drives (mixed brands). After a week or so I was synchronized with the Ethereum blockchain and feeling very proud of myself. The correct way to set this up using FreeBSD is to partition the SSD into a main, swap, ZFS L2 cache, and ZIL segments. I would recommend sizing the ZFS L2 cache to at least twice the size of the total RAM and the ZIL can be something like 8GB/16GB. You really don’t want to be swapping on an Ethereum node, so I just use 4GB for swap. Ethereum software is quickly evolving, so you want to account for the inevitability of bugs. (memory leaks)
One of the best things about ZFS is that since it is old, there is rather complete documentation of the type one does not see anymore. I found the idea described in the manual that the ZIL is intended to prevent data loss during a power outage. In practice, the ZIL improves the synchronous write time, if you lose power or otherwise kill geth in a less than graceful manner, you will need to spend hours synchronizing your blockchain. Be careful with your service management scripts!
With some trial and error, I quickly found that go-ethereum (geth) was preferable because it’s so much easier to build than Parity, and if you want to keep your Ethereum nodes synchronized, which you must, you will find yourself constantly pulling in new updates from GitHub.
Around this time my home Ethereum node started to lose sync on Friday nights and during wet weather most likely due to Netflix related bandwidth congestion, additionally I found that it is impossible to synchronize on a WiFi connection (to the same router, wired would work). Amongst other things bandwidth is a factor in synchronizing Ethereum nodes. Since upgrading to gigabit Internet at my home, I’ve had no issues at all. Network performance seems to be the same as at my local colocation provider now that I have gigabit Internet at home.
When it was time to actually work on the dApp, my single Ethereum node fell out of sync, making it impossible to execute anything on the blockchain. Since I had only a little time before the convention I frenetically contacted a few dedicated hosts that had good deals on servers with fast storage and low RAM (16GB).
It’s sort of difficult to size a dedicated server because in 2017 the Ethereum blockchain was 1.7TB long. At the end of 2018 I could still use 256GB class SSD servers, although one can’t be sure for how much longer. I asked one of the dedicated hosts if they could build a machine similar to the one I was using at home.
Adventures in Colocation
The dedicated host came back to me with some outrageous quotation, so I started looking into some colocation options. I found one company, U.S. Dedicated, in the same building as my dedicated host. I dropped that dedicated server, in part because I honestly felt sick about the SSD’s failing. I had experienced so many hardware failures early on, I was afraid I was killing some poor small business with my shenanigans. Since then I have had no further hardware failures, so who knows?
I set out to build the blockchain machine of my dreams. I got really lucky and found some 256GB NVMe x2 SSD’s in a close out for $30. This is more than enough storage for a hybrid system, and in fact might be the fastest you can use with an affordable “true” server board (with IPMI/BMC). I’ve had excellent success with AMD processors recently in terms of price/performance so I took a gamble and went with the Ryzen platform. The downside is that there are currently no Ryzen boards with IPMI/BMC. The upside is I was able to buy three motherboards and two processors for about $350.
IPMI/BMC is a remote management subsystem built into “server” class motherboards. On the low end, they cost around $220 plus another $200 for a low end Xeon. While this will certainly work almost as well as a low end Ryzen, some hosts such as U.S. Dedicated will not accept a machine without an IPMI/BMC interface. Other hosts in the Dallas area such as DallasColo, actually charge extra for using that port! Certainly shop around!
The most expensive part is the RAM. According to NetFlix’s experiences (https://medium.com/netflix-techblog/serving-100-gbps-from-an-open-connect-appliance-cdb51dda3b99) with this type of machine, the speed of the RAM contributes significantly to I/O performance. For this reason I would recommend getting DDR4–2666, pay attention to the motherboard, I put 4 DDR4–2666’s in as an experiment and 32GB was recognized but the motherboard only reported them as DDR4–2400. I took two out, and we were back up to DDR4–2666.
The final build looked something like this:
- Ryzen 3 2200G (Quad-Core 3.5GHz-3.9GHz Turbo, roughly equivalent to the Ivy Bridge)
- 2x 8GB DDR4–2666
- 256GB M.2 NVMe x2 SSD
- 4x 2TB Seagate Barracuda Green
- ASROCK B450M PRO (at least it’s not the gaming model)
- Standard ATX power supply
- “Industrial 2U case”
Total cost: ~$700
Instead of an expensive server power supply I use a standard ATX Antec EarthWatts Green EA-380D GREEN 380W Bronze. Oversized for this machine, but I’m very confident in the design having used it for years. This required me to use a 2U “industrial” type case which it would barely fit into, which can be purchased very inexpensively.
Once again, the best way to configure this for FreeBSD is to partition the main drive into a ZFS L2ARC cache partition (at least 32 GB, maybe more since this partition will get more wear), ZIL partition (8GB/16GB is plenty), swap (4GB is fine for me) and then use the remainder for the system partition which runs the software.
Another great thing about FreeBSD is the handbook, where you can easily find instructions on how to setup ZFS and relocate your home folder to the motherboard. It is absolutely vital that you store the blockchain in the ZFS storage pool. What makes this setup so economical is that the 256GB class SSD’s are very cheap and serve as a buffer for the slower hard disks. So you have 4TB of fast, redundant storage at a fraction of the cost using SSD’s to store the blockchain.
I have one in the colocation center, one in my home, and they both have no problem staying synchronized. I have a strong feeling this current dearth of execution on blockchain technology is a result of attempting to develop on a “rocking” platform. While using a testnet will free you from the problems of node competition, you will spend a lot of time setting up a testnet with a miner, and may end up with a system which will not work in the chaotic environment of the public Ethereum network.
Specific Guidance on Synchronizing Ethereum Nodes
1. I/O throughput and network capacity are crucial
I/O throughput is not something that really comes up in benchmarks, but it is currently impossible to synchronize with the Ethereum network using only magnetic disks. The point of using a hybrid storage pool that you don’t need to store the entire blockchain on expensive SSD’s. You can’t have one without the other; I have a dedicated machine with two fast magnetic HDD’s and enough network capacity to get 200 peers, but it was still never able to get more than 100 blocks behind synchronization with all the tuning I could muster.
2. Flexibility is key
I use two different types of servers, traditional SSD based servers running Ubuntu 18.04 LTS and hybrid storage servers running FreeBSD 12.0. So if the blockchain gets too big, I can fallback on the FreeBSD nodes. I’m even thinking about trying ZFS on Ubuntu now that FreeBSD is considering using the Linux branch of ZFS. But, yes, it is a major pain to maintain two sets of startup scripts. It’s even worse to not have a plan B in such a chaotic environment.
3. Three seems to be the practical minimum number of Ethereum nodes
If you want to be able to consistently run your dApp. It’s almost certain that one of the nodes will be temporarily out of sync for whatever reason so you would want a backup. The third machine sort of makes everything less stressful and provides a staging ground to test new configurations. Additionally owning more than one node gives you the ability to designate your other nodes as static, trusted, or boot nodes which will help you bring an out of sync node back into sync faster.
4. Configuration and tuning pay dividends
Rather than throw hardware at the problem, some well thought out configuration can significantly improve performance, costing you only time. In the fast changing landscape of Ethereum, documentation can consist of commit comments and reading source code. One easy optimization on FreeBSD is to use the aesni kernel module (you can enable it without rebooting using kldload). This will dramatically reduce processor usage for encryption, this is a particularly effective optimization because encryption of a single data stream cannot use more than a single core. It might be necessary to rebuild the openssl port, then update make.conf to use this version of openssl, and then rebuild all ports that depend on OpenSSL.
5. Benchmarking on a your actual workload
The corollary of tuning is benchmarking on a real world workload. There are tons of benchmarks available online that might give you some idea of a particular component’s performance, but one must look at the particular workload. Particularly when using an adaptive cache like ARC2 in ZFS, it will take some time to saturate (a number of minutes) so simply dividing the work performed by the runtime may not give an accurate metric. Taking this a step further, even ‘desktop’ Ryzens dramatically outperform the type of Xeon, such as E3–1220 V5 found in low end dedicated servers in encryption. Given the de-facto requirement for SSL from Google, this single factor might outweigh Xeon’s advantage in other workloads.
6. A plurality of Ethereum nodes that connect to my nodes are Amazon EC2 images
Not sure if it’s necessarily better or worse, but it’s much more expensive, and these servers have some salvage value when it’s all over, unlike the Amazon bills. The costs for dedicated hardware are much more predictable than the literal calculus you need to estimate the cost of a suitable EC2 based system.
As mentioned extensively in the online Ethereum documentation, the P2P nature of the Ethereum protocol points a target on you, specifically your Ethereum wallet. I would not run any other services except ssh and your Ethereum nodes. I would speculate that a lot of the issues with practical dApps are a direct result of this development process. For example, it would seem on a testnet that you only need a single node to connect to the IPC interface, but on the public Ethereum network, you have to contend with “bad peers” that will push you out of sync and you might find yourself having to rearchitect the system you developed on a testnet.