Chainode Tech develops a Validator and Bridge Status Monitoring Solution using Grafana & Prometheus for the Celestia Network

Chainode Tech
Chainode Tech
Published in
7 min readMay 8, 2023

--

Switzerland, May 9th 2023 — At Chainode Tech, our mission as a provider of Web3 infrastructure and staking services is to constantly seek out new and innovative solutions to advance the blockchain ecosystem and push the boundaries of decentralization. Along the way, we have discovered novel approaches and cutting-edge solutions like sidechains, sharding, Proof of History, and most recently, Modularity.

Celestia is a cutting-edge blockchain platform that is transforming the way we think about modularity in the decentralized landscape. Since the initial stages of the Celestia project, we have been closely monitoring its progress and were excited to have been selected among the 1000 participants that take part in the Blockspace Race Incentivised Testnet, which is currently underway.

In this article we will delve into the fundamental concept of modularity in blockchain systems, explore the innovative solutions provided by Celestia, and introduce a monitoring tool developed by our team specifically for the Celestia ecosystem.

So: What is Modularity?

Modularity, at its core, refers to the design principle that emphasizes the separation of a system’s components into distinct, manageable units or modules. In the context of blockchain, modularity allows for the creation of flexible, customizable networks that can adapt to a wide range of use cases and requirements. However, the existing blockchain platforms often suffer from a lack of true modularity, leading to issues like network congestion, high transaction fees, and limited scalability.

Enter Celestia, a pioneering blockchain platform that tackles these challenges by offering a modular framework that is both highly flexible and technically robust. By decoupling consensus and execution, Celestia allows developers to build scalable applications on top of a shared, secure, and interoperable infrastructure. This groundbreaking approach not only fosters innovation and collaboration but also paves the way for a more accessible and efficient decentralized ecosystem.

Chainode Tech’s Monitoring tool for Celestia Validator and Bridge Status using Grafana & Prometheus

To ensure the smooth functioning of Validators and Bridges operating on a network, it is crucial to use a comprehensive monitoring tool. With this goal in mind, we at Chainode Tech recently developed and publicly released a Monitoring Solution for the Celestia Network, available on GitHub at https://github.com/Chainode/CelestiaTools. Our in-house solution empowers users to monitor their validators, Celestia bridges, and the hardware that runs both components. By doing so, users can gain valuable insights into the state of their machines and the overall performance of the Celestia Network, enabling them to take proactive measures to improve network health at any given time.

Our monitoring solution for the Celestia Network is built upon Prometheus, a robust tool specifically designed for storing time-series data, including metrics. To complement Prometheus, we’ve integrated Grafana, which allows users to visualize the data stored in Prometheus, as well as other sources like Telegraf. By leveraging this integration, our Monitoring Solution provides users with a comprehensive toolkit to make informed decisions and optimize the performance of their validators and bridges on the Celestia Network.

Our team wrote an extra exporter in Go, in order to export Celestia Bridge metrics towards Prometheus. The code can be found in the file celbridge_export.go.

The code basically asks the bridge about the local height and the height of the network and exposes them as metrics for Prometheus. It is also possible to specify the listen port for Prometheus as well as the endpoint for connecting to the bridge and the p2p network the bridge node is active on.

The necessary technical guide to set up all the components for this monitoring solution is to be found directly under the above mentioned repository. The focus of this article is to present all the metrics that are being displayed in the developed Grafana dashboard.

The Grafana dashboard we developed is organized into three categories, which are as follows:

1. Celestia Validator Overview

This section offers a comprehensive overview of both the validator’s health and the network’s status. The displayed parameters include:

  • Block Height: This is the current consensus height the validator will see;
  • Validator Voting Power: This metric displays the validator’s total voting power on the network, calculated based on received delegations;
  • Online: Shows the total number of the validators in the network;
  • Unconfirmed Tx: Shows the number of unconfirmed transactions;
  • Failed Tx: Shows the number of failed tx in the memory pool;
  • Validators Missing: This displays the current number of validators that are not participating in consensus;
  • Voting Power of Byzantine: This showcases the voting power of the byzantine validators. A byzantine validator is a validator that is considered to be harmful to the network. One behaviour that would make a validator byzantine is double signing.
  • Mempool Size: The size of the memory pool, in principle it represents the number of unconfirmed transactions;
  • Connected Peers: The number of peers the validator node is connected to;
  • Total Bonded Tokens: The total number of staked tokens;
  • % of (Missing + Byzantine): This metric represents the voting power percentage of missing and byzantine validators;
  • Block Size (KBytes): The block size in KB;
  • Byzantine Validators: The number of validators that are Byzantine;
  • Block Time: This metric displays the block time in seconds;
  • Bridge Local Height: This represents the local height of the bridge node;
  • Bridge Network Height: This represents the network height for the bridge;
  • Status Bridge Node: Displays the status of the operated bridge node. There are 3 states for this status:

a. Synced — if the difference between bridge network height and bridge local height is 0;

b. Syncing — if the difference between bridge network height and bridge local height is higher than 1;

c. Error — if the difference between bridge network height and bridge local height cannot be computed. It also means there is an error for at least one of the values and it requires your attention.

2. Validator Detailed Charts

This section contains four time-series charts that allow you to track specific values over time, providing you with valuable insights and trends.

  • Validator Voting Power: Here you can track the variation of the validator voting power over time. If your validator’s voting power is suddenly 0, this means your validator has issues and requires immediate attention;
  • Connected Peers: You can track the variation of the number of connected peers over time;
  • Block Size (KBytes): You can track the variation of the block size over time;
  • Tx: This allows you to monitor how the number of transactions varies over time.

Other valuable charts to track are:

  • The status of your validator over time, based on block height. It’s important to ensure that the height never reaches 0 or gets stuck, and time-series charts in Grafana can help define alerts for this purpose.
  • The difference between bridge network height and the local height of the bridge node over time. You could then set alerts in Grafana for this time series chart if this difference is higher than a certain threshold, ideally 0 but to avoid any false alerts due to small technical hiccups or short catching-up phases, a higher threshold like 5 should deliver true alert results.

3. Hardware Overview

This section enables you to monitor the hardware underlying your validator and bridge nodes.

Here you will be able to:

  • Monitor uptime of your server;
  • Observe the number of CPU Cores, RAM and SWAP;
  • Monitor the usage of CPU, RAM, and SWAP;
  • Monitor the current open file descriptors;
  • Monitor the free Space for your disks;
  • Monitor the average system load of the underlying hardware for your validator;
  • Monitor the CPU usage and disk I/O operations per second in %;
  • Monitor the disk read and write rate (IOPS);
  • Monitor the disk read and write capacity;
  • Monitor the disk read and write time;
  • Monitor the network traffic;
  • Monitor the TCP connection situation.

If you have any feedback or questions about the Monitoring Dashboard, you can write us on our Discord, Telegram or contact us at contact@chainode.tech.

About Chainode Tech

Founded in early 2019, Chainode Tech is a Web3 infrastructure and service provider based and registered in Zug, Switzerland, focused on bootstrapping innovative Web3 protocols, providing necessary infrastructure and tooling as well as offering staking and validator services on cutting-edge Distributed Ledger Technology (DLT) protocols. Some of the networks joined so far on Mainnet (since early Testnet phases) are Solana, Sui, Avalanche, The Graph, Wormhole, Axelar Network, and more. We were also selected to join Celestia’s Blockspace Race Testnet.

The team behind Chainode Tech is extremely motivated and experienced with background in Software & System Engineering, System & Integration Architecture, DevOps, Marketing, Business Growth and Capital Investments. The project is now active on 15+ Mainnet Networks and multiple Testnets while running a robust 25+ Networks RPC infrastructure fleet.

Website | Twitter | Telegram Chat | Telegram Announcements | LinkedIn

--

--