The Challenges of Running Blockchain Infrastructure

Published in

Bison Trails

9 min readJun 5, 2019

Running a node is almost like magic — anyone can be part of a special cohort securing the chain, participating in consensus, and having an unstoppable entryway to a global economy. But if running a node is the most important thing you can do on a blockchain, why is it so hard?

Proof of Work chains are secured by professional miners with significant capex and opex costs, making participation inaccessible to the average enthusiast or organization. But with this new cohort of Proof of Stake chains (e.g., Cosmos, Tezos, Decred, ETH 2.0, etc.), we are seeing a democratization of blockchain participation from a small group of scaled professionals to a wide variety of investors, exchanges, custodians, dapps, enterprises, and businesses.

However, running reliable, secure infrastructure is much more challenging than simply spinning up generic protocol nodes. These difficulties come down to 5 areas: software, hardware, maintenance, security, and operations. Bison Trails exists to solve them.

Green Node & Protocol Software

A vital detail often forgotten is just how young these protocols are and how unstable the node software is. Like most software, many years of refinement is often required to iron out all the kinks, and this is doubly true for distributed networks. Nodes already prone to going down are further stressed by frequent protocol updates that aren’t always audited and can introduce bugs.

Even Bitcoin, arguably the most stable and secure cryptocurrency, is no exception. A change in how blocks are checked was implemented in 2016 with a bug that made it possible for a malicious actor to crash the majority of nodes and miners on the network. It took almost 2 years for the bug to be patched, which was thankfully not exploited during that time.

Bugs are certainly problematic, but even standard, generic protocol nodes can be troublesome.

If the protocol and node software teams are doing great work, the node launch can go off without a hitch, but over time, more and more troubleshooting will be required as settings, requirements, and configurations change, forcing the user to become an expert not just in that specific protocol’s infrastructure, but also any interdependencies. Livepeer, as an example, is a video streaming protocol built on top of Ethereum that had its nodes temporarily crash due to an update rolled out for Ethereum Geth nodes.

Thorny Hardware Requirements and Costs

While self-sovereign purists will insist on running nodes on bare-metal servers, this is not viable for the vast majority of potential network participants. Cloud infrastructure providers have built up enterprise-grade offerings for things like uptime, availability, APIs, threat detection, load balancing, storage and more that would be impossible to replicate. By using cloud infrastructure, applications built on top of the cloud take advantage of all of these technological breakthroughs. Even Bitcoin, the oldest, most widely adopted, and most distributed blockchain has more than 25% of their nodes hosted with just 3 cloud providers.

Node requirements change as the blockchain grows and while a smaller virtual instance might have sufficed previously, you’ll have to upgrade frequently to keep up with a maturing chain. This requires manually winding down instances and spinning up new nodes, all with the accompanying troubleshooting. Alternatively, you can get a large instance at the start, but then you’re significantly overpaying as the node grows into it.

Throwing a node up on us-east-1a (AWS) because you reside in NYC and haven’t tried other cloud providers creates its own problems. There’s some obvious hotbeds of crypto activity on US coasts, parts of Europe, and certain regions in Asia which can cause significant node centralization. If everyone hosts their nodes with the same providers in the same data centers, intentional and accidental downtime begin to pose a real risk to the network.

Source: When Amazon Web Services Goes Down, So Does a Lot of the Web by nymag

This is a huge issue for any dapp or business interacting with a blockchain and an even bigger problem for the chain itself. Attackers will know nodes have gone offline and will soon attempt to reconnect and resync from surrounding peers, so they may attempt to launch eclipse attacks and scam unsuspecting users. This is even worse in participatory networks because as validating stakes get knocked offline they are in danger of being punitively slashed (on certain networks) and can significantly lower network security, opening the door for malicious attacks.

Time and Resource Consuming Maintenance

Blockchains and the node client you’re using are the epitome of living software and need constant babysitting for both regularly scheduled updates and urgent ones. There’s no alerting system when you’re out of date and there’s no way to auto-update if you manually run your own infrastructure. A backwards-compatible update may not matter to an enthusiast, but dapps and businesses can be significantly impacted from opcode updates, API changes, gas fee increases, and many other factors.

Source: Parity Technologies Fixes Node Vulnerability, Urges All Ethereum Nodes to Update

Although soft forks are backwards compatible, hard forks can be particularly disruptive not just due to their lack of backwards compatibility, but also their frequency. Unless you’re Monero and hard forking every 6 months, most teams just don’t have a lot of practice with them. It’s similar to Extensive Decision Making in behavioral sciences — we’re bad at executing in unfamiliar or infrequent circumstances with high degrees of economic, performance, and psychological risk. This impacts not only the protocols themselves, but also the teams building products, tooling, and services on top.

When you first spin up a node, it can take hours or even days to fully sync the chain depending on the hardware and configurations you use. If it goes down, resyncing can take minutes, or hours, depending on how far behind you got and the reason your node crashed. Running backup nodes is challenging because it’s not about having more nodes, but about the diversity of nodes. It requires a familiarity with different node clients and hosting environments for redundancy.

Myriad Security Vulnerabilities

Blockchain assets are bearer instruments by default and security could not be more important. You can get away with lower security on simple read-write nodes, but once you move into the participatory space (e.g., ETH 2.0, Tezos, etc.) you begin having an immense amount of value at risk of theft or slashing. Generic protocol nodes keep keys and secrets in easily accessible areas for anyone to access. Key and information leakage is a widespread problem that can happen explicitly through improper key and secret management, or implicitly through improper infrastructure setup, wrong usage of cryptographic primitives, and many other ways. There are best practices for securing nodes that goes beyond storing things correctly, and people who aren’t experts often make bad or uninformed choices in this area.

Laborious Operations Management

One of the most challenging parts of running your own participatory nodes is interacting with them. You either need to read the documentation and utilize command line interfaces for validating, voting, delegating, making proposals, participating in governance, etc., or you’re stuck delegating to other entities and using their web UI. There’s a sovereignty vs usability tradeoff that puts funds at risk and increases the administrative headache of being active participants on the protocol.

As a dapp or business that needs reliable, performant infrastructure, you either need to run the nodes yourself or trust your business to a 3rd party provider (e.g., Infura). Running and managing many nodes independently even on a single chain requires significant engineering resources for upkeep and optimization (e.g., load balancing).There are also advanced node configurations that can optimize its performance for the task at hand or limit its capabilities for added security. Different users have different needs, but each deciding on their own and changing values by hand can produce subpar results. Whether it’s API access settings or whitelisting addresses, it’s better not to roll your own.

The scary bit is that all of these challenges compound with each additional chain you interact with! The hours spent setting up, troubleshooting, and worrying about blockchain infrastructure quickly adds up as each investor, dapp, and business do it in parallel. The question becomes: should you have to be an infrastructure expert to interact with a blockchain?

Bison Trails exists to democratize access to blockchain participation. You shouldn’t have to be an infrastructure expert to run nodes. Netflix is an expert in algorithmic content curation and Lyft is an expert in ride sharing — both use AWS. Neither needs to be an expert in infrastructure. We’ve built a multi-cloud, multi-region infrastructure orchestration platform on that belief and are incredibly excited to provide a solution to the 5 challenging areas outlined above.

Optimized Software

We’ve developed a proprietary framework for service, workload, and resource orchestration to automate the management, coordination and organization of all our customers nodes across all our supported blockchains, regions, and cloud providers. In short, this means that we’re able to maximize our service and uptime while minimizing costs. We turn generic, one-size-fits-all protocol nodes into advanced, optimized, dedicated machines specific to our users’ use cases.

Modularized, Capital-Efficient Hardware

We built our self-healing infrastructure from the ground up with the following goals: proactive problem detection, limiting damage and interruptions, minimizing diagnosis time, and quick resolution. Our infrastructure is elastic — optimally scaling up and out depending on each protocol’s unique characteristics. Modularizing components creates highly performant and capital-efficient nodes while expert networking enables and secures the platform.

Easy Maintenance

Bison Trails uses cutting edge technologies unique to each chain to optimize node deployment and management. Our infrastructure has built-in backup components engineered for efficiency, speed, and availability that automatically take the place of any failed components, ensuring no loss of service. We strive for five nines, or 99.999% uptime, the “holy grail” of availability. Although most interactions for initial setup and ongoing maintenance are automated, Bison Trails has working hours (8–8EST) email support, ramping to 24/7 phone & email support. This extends beyond simple platform questions to technical and blockchain-specific inquiries.

Best-in-Class Security

Bison Trails is a technology and security company at its core. We never take custody of user funds and employ best-in-class security practices to protect our customers information and privacy. No security is absolute — we employ modern security principles that increase the costs of security breaches significantly. We use advanced intrusion detection and modularized component obfuscation to reduce attack vectors on our infrastructure. All keys and secrets are layered, encrypted, and never exposed outside of a secure environment.

Intelligent Operations

Advanced metrics, monitoring and alerting gives you real-time visibility into your nodes. It also allows us to quickly identify and resolve issues. We have layers of insights not just into our infrastructure stack, but also the underlying protocols, and can quickly work with protocol core devs to remedy network-wide problems. We maintain a number of sentry nodes that serve as the public-facing side of our node infrastructure, obfuscating where participatory network stakes are actually located and enabling Bison Trails to quickly respond to DDoS attacks.

We believe that everyone should run their own nodes and our mission is to make that a reality. We’re currently supporting top tier exchanges, custodians, investors, projects, and dapps across six networks: Ethereum, Cosmos, Tezos, Algorand, Decred, and Livepeer. Follow us on Twitter and join our mailing list to keep in touch and be one of the first to join our public Beta and build a truly decentralized world.

Viktor Bunin is Protocol Specialist at Bison Trails where he researches emerging technologies and translates those insights into direction on business development, product, operations, and marketing. You can follow him on Twitter here.

The Challenges of Running Blockchain Infrastructure

Written by Viktor Bunin