Dzmitry Baranau

Building the Backbone of Agnostic: A Closer Look at Our Blockchain Infrastructure

Arnaud Briche
Agnostic
Published in
7 min readJan 24, 2024

--

Introduction

At Agnostic, our mission is to democratize access to well-structured blockchain data. We aim to provide a swift, user-friendly, and robust method for querying the vast volumes of data generated by smart contract blockchains. At the heart of our operations lies a fundamental process: aggregating extensive data from blockchain nodes, decoding it, and indexing it in diverse ways to facilitate rapid and flexible data queries. As we continually retrieve data from the blockchain, one of our primary prerequisites for delivering on our mission is ensuring fast and dependable access to blockchain nodes.

I. Blockchain Archive Nodes: A Vital Component

Blockchain archive nodes are the unsung heroes of the blockchain ecosystem. These specialized nodes play a pivotal role in Agnostic’s mission to provide comprehensive and reliable blockchain data.

Archive nodes are a distinct type of blockchain node that stores the complete history of a blockchain, including every transaction, smart contract interaction, and state change since its inception. They serve as the historical record-keepers of blockchain networks.

They allow Agnostic to access historical blockchain data, enabling users to trace the evolution of smart contracts, token transfers, and more over time.

II. The Cost Challenge: Why Self-Hosted Nodes?

In our pursuit of securing fast and dependable access to blockchain nodes for Agnostic indexers, we swiftly recognized that relying on third-party RPC providers would be financially unsustainable. These providers often impose charges based on request volume, and levy premiums for accessing historical data — a requirement crucial for indexers.

Drawing from our collective experience in managing extensive infrastructure, including bare-metal servers from previous roles, we opted for the self-hosted approach. However, achieving stability and robustness in our node fleet was no small feat and involved a significant trial-and-error process.

Our self-hosted node strategy was guided by several key technical decisions:

  • We prioritized using Erigon-based clients whenever available, as they proved to be more cost-effective and easier to fine-tune.
  • For reliable performance and satisfying availability, we employed fast NVMe disks configured in a RAID setup on bare-metal servers.
  • We used the fastest network-attached disks in a given cloud environment, as local disks are a no-go for persistent data.
  • We implemented ZFS, capitalizing on valuable features such as snapshots and transparent compression.

The result? We successfully established and maintained a robust fleet of archive nodes that deliver both speed and reliability while remaining cost-effective.

III. Balancing Act: Scaling our blockchain infrastructure while keeping costs under control

From the inception of Agnostic, we made a deliberate choice to address our infrastructure scaling needs by creating multiple, mostly independent Points of Presence (PoPs). This approach involves building relatively small, geographically distributed clusters that operate without shared infrastructure components like networking, storage, or control planes.

Within each PoP, we ensure the presence of at least one archive node for each supported blockchain, allowing indexers to access data locally. However, to fulfill our commitment as a reliable infrastructure provider, redundancy becomes imperative to withstand potential failures. This redundancy entails having a pair of archive nodes for each blockchain within each PoP. While this redundancy boosts resilience, it can also incur significant costs.

In our experience, most node failures have resulted from issues like failed software updates or temporary hardware or network glitches, leading to downtimes in the order of hours. This has prompted us to rethink our approach. Is there a more cost-effective way to address these transient failures without bearing the full expense of redundant nodes for each blockchain in each PoP, just in case?

Certainly, we believe there’s room for improvement.

IV. The Genesis of Agnostic’s Proxy: Cheap but effective High Availability strategy”

We embarked on a journey to devise a more cost-effective yet equally reliable solution. The vision was clear: to create a system that would allow PoPs to tap into the resources of other PoPs when their local infrastructure encountered unavailability. This ingenious approach aimed to achieve robust availability while optimizing operational costs — a fundamental principle guiding our actions at Agnostic.

This pivotal moment marked the birth of our in-house blockchain RPC proxy system — a cornerstone of Agnostic’s infrastructure.

V. Efficient Traffic Management with Node Proxy: Smart Routing for Blockchain RPC traffic

The core configuration of our proxy revolves around a fundamental concept: pools. Each pool comprises nodes organized into tiers (tier-1, tier-2, tier-3, and so on). Our proxy maintains ongoing, out-of-band communication with these nodes, gathering both general health data and blockchain-specific information, such as the height of the highest available block.

Additionally, within each pool, a designated probe node is configured. This probe node serves as a reference point for calculating a lag metric for every other node in the pool. This metric is derived from the difference between the highest block number reported by the probe node and the corresponding number reported by each individual node in the pool.

For every pool, we define a maximum allowable lag threshold, and nodes that exceed this threshold are marked as unhealthy. Incoming traffic to a pool is intelligently distributed among the healthy nodes belonging to the lowest, non-empty tier. For instance, as long as at least one tier-1 node remains healthy, all traffic destined for the pool is directed to this node.

Let’s consider an example of a pool configured for Ethereum Mainnet in PoP A:

  • Probe Node: Alchemy RPC provider
  • Tier-1: Ethereum archive nodes from the local Kubernetes (K8S) cluster
  • Tier-2: Node Proxy of PoP B
  • Tier-3: Node Proxy of PoP C
  • Tier-4: Infura RPC provider
Example topology of Node Proxy deployment

This setup ensures that traffic primarily flows to the local node. Only in the event of a failure or maintenance of the local node will traffic be directed to PoP B and subsequently to PoP C. In the rare scenario where all self-hosted nodes are unavailable, we resort to calling upon an RPC provider, albeit at a premium cost. This strategic approach buys us crucial time to address in-house node issues swiftly and efficiently.

VI. Simplified Application Configuration: How Agnostic’s Proxy Enhances Indexer Performance

When crafting high-performance indexers, we often employ various optimizations to harness the full potential of node RPC APIs. One of the most notable optimizations is the extensive use of request batching. However, this approach can become problematic when dealing with RPC APIs provided by a diverse range of nodes.

For instance, we used to set remarkably high batch size limits on our self-hosted nodes to maximize hardware utilization. In contrast, RPC providers tended to impose relatively smaller batch size limits. This discrepancy created a conundrum for us: either we could compromise on performance and use suboptimal settings to accommodate the more conservative nodes, or we risked lower availability by tailoring our application exclusively to the self-hosted nodes.

As part of our ongoing investment in the in-house proxy layer to enhance availability, we made a strategic decision to address these inconsistencies at the proxy level. This approach allowed us to fine-tune indexer configurations aggressively for ideal scenarios (when accessing self-hosted nodes), while simultaneously ensuring transparent and graceful handling of nodes with more conservative settings.

The proxy now excels in ensuring that we adhere to various limits on parameters such as maximum batch size and rate limits, regardless of the client’s settings. This proactive management guarantees uninterrupted operation, even under challenging conditions, offering both performance optimization and robustness.

VII. Securing Access and Monitoring Performance

Our proxy layer seamlessly integrates security and performance enhancement. Authentication serves as our security foundation, permitting access solely to authorized entities.

To provide controlled access for untrusted clients, we rely on JSON Web Tokens (JWTs) with encoded limitations, such as batch size, method restrictions, and rate limits. These safeguards prevent disruptions and misuse. Simultaneously, our proxy layer diligently monitors metrics in real time, offering insights into infrastructure performance, health, and usage patterns. This comprehensive approach guarantees a stable, secure, and transparent blockchain data access system.

Conclusion: A Glimpse into Agnostic’s Infrastructure Success

In our exploration of Agnostic’s blockchain infrastructure, we’ve unveiled its intricacies, challenges, and innovative solutions, showcasing the significance of our mission.

Our Commitment to Efficiency: At the core of Agnostic’s infrastructure is an unwavering dedication to efficiency. We self-host blockchain nodes, optimize costs, and establish a resilient architecture to deliver outstanding blockchain data services without imposing exorbitant expenses on users.

Resilience and Prudent Redundancy: We prioritize resilience by implementing multiple, mostly independent Points of Presence (PoPs). Geographically distributed resources and redundancy fortify our infrastructure against potential failures, ensuring uninterrupted service for our users.

Cost-Effective Redundancy: Our innovative approach to handling node failures via a proxy mesh network underscores our commitment to cost-effectiveness. By leveraging our distributed node fleet, we optimize costs while preserving availability.

This glimpse into Agnostic’s infrastructure success signifies more than technological achievements; it underscores our steadfast dedication to simplifying blockchain development and data access. Looking ahead, we remain devoted to innovation and excellence, poised to refine and expand our infrastructure to serve our growing user base, leading the charge in advancing the blockchain data landscape. Join us as we propel Agnostic into new frontiers of blockchain technology, where accessibility, efficiency, and reliability converge to redefine what’s possible in blockchain data services.

--

--