PARSIQ Exposed | Tsunami API

Part 2, with CTO Daniil Romazanov. Edited by Konstantin Konukov

PARSIQ
PARSIQ
9 min readAug 1, 2023

--

Welcome back! This is Part II of a three-article series about PARSIQ Network.

  1. Introduction
  2. Tsunami API
  3. Data Lakes

In this part, we are going to talk more about Tsunami API, namely to answer the following questions:

  • what are the thoughts behind its creation?
  • what problems is it striving to solve?
  • what is it as a technology?

I would like to mention that Tsunami API is proprietary software and our intellectual property. While source codes of our API are unlikely to become public, I am eager to share more about the fundamental technology that we have built, which is a cornerstone of everything we are going to build in the future.

There is a reason why we keep using the “fundamental technology” term. I perceive Tsunami API as a sort of blockchain node, but instead of providing efficient access to write data to the blockchain, it grants efficient access to read data from it.

There is clearly a conceptual difference between Tsunami and a blockchain node because the former doesn’t imply decentralization, at least at the current stage, where we can’t guarantee it concurrently with efficiency. Although Tsunami is a non-decentralized product, it doesn’t mean it can’t be trusted, but I will elaborate on how we maintain consistent data later in this article.

What is data?

Before jumping into the specifics of Tsunami, we should define what “data” means in the case of Web3. In the world of Web2, it can be anything and anywhere. That means you are limited only by your imagination and the current tech being quite extensive. Data can be of any format and stored anywhere, having as many copies of information as the owner requires.

The situation in Web3 is different. We have been given a data framework, EVM in our case, that is pretty limited due to the nature of blockchain and the fact that every node has to store everything.

EVM limits us to the following meaningful data points:

  • Blocks
  • Transactions
  • Logs, known as Events
  • External Function Calls
  • Contract Creates
  • Contract Self-Destructs

EVM contains a few dozen more opcodes available, but they are mostly irrelevant for data use cases. Basically, this is the data of an EVM-like blockchain, meaning any solutions / APIs you see in the space are built atop this data. Taking into account the background we had, Tsunami API was a logical next step for us. You can read more about it in the Introduction article.

What problems is Tsunami API solving?

There were several issues that we intended to solve with Tsunami API:

  1. Infrastructure maintenance
  2. Data accessibility & availability
  3. Interoperability
  4. Rich data
  5. Flexibility of access

Note that this only applies to the pressure points of Tsunami, and the ones specific to Data Lakes will be pondered in the next article.

Let’s review each of these issues in a bit more detail.

Infrastructure maintenance

This one is simple. We wanted to relieve our clients of the burden of infrastructure maintenance that comes with a bunch of headaches. This means 24/7 node monitoring, keeping up with updates, hard forks, and frequent breakdowns.

Data accessibility & availability

This implies that data is accessible no matter what you are requesting and when you request it — whether it is a recent state or a transaction from five years ago. In addition, you shouldn’t worry that data might be unavailable when you request it, as the backend you request from (Tsunami API, in our case) has close to 100% uptime. This means running multiple nodes for each chain you support and maintaining load balancing for them so that your project always has access to the required data.

Interoperability

No matter how many blockchain platforms you are on, you will always long for the same interface to access it, period. This means that every blockchain platform will have either slightly or completely different data interfaces, and you will have to standardize this data yourself.

Rich data

In the case of Web3 data, “rich” implies that all meaningful data points (see the “What is data? section”) are covered to the fullest extent. This means you will have to extract all of the data points on your own, including logs and internal smart contract transactions, none of which is a trivial thing to do.

Flexibility of access

This one is tricky because each piece of information on the blockchain has different fields available from which to choose. Examples of the aforementioned fields can be found in our documentation. We had to provide access to data in a manner where you can find this data by combination of any of these fields for any timespan. This means that for some of your data needs, you will have to index data off the blockchain on your side because the vanilla Web3 API comes with limitations on filtering data. For instance, if you wanted to get every internal transaction that occurred within your smart contract, you would have to find every transaction that interacted with your smart contract and then traverse a giant debug trace for every transaction to find entities with the right opcode.

That is what the concepts of Tsunami were born out of: a resilient, up-to-date, flexible API to extract any information from various blockchain platforms as quickly as it takes to make a single HTTP request.

How does Tsunami work?

While the user interacts with Tsunami through a single interface, in reality, Tsunami consists of four essential components:

  • Nodes pools
  • Ukulele
  • Tsunami DB
  • API

Node pools

For every supported blockchain platform, we run a set of our own nodes. The combination of these nodes is different and depends on the platform size, technical background, and how often they tend to break. Typically, each platform will have a minimum of two archive nodes and several full nodes. The nodes in a single node pool, like Ethereum, are spread out in different geographical locations to ensure the fastest block propagation to our infrastructure.

Running our own nodes is not uncommon for a data provider. However, our case is unique because every node we operate is a forked version of its original node, mostly Geth or Erigon. The nodes have additional RPC tracing methods that can quickly extract information about the entire block in just 50 milliseconds. This provides every transaction, event log, and internal transaction in one request, making it a crucial tool for us.

Efficiently extracting data from nodes is paramount for delivering data with minimal lag, benefiting both historical and real-time capabilities. When it comes to real-time data, quickly processing a block is vital so we can promptly deliver webhooks to our clients once the block is validated or mined.

Ukulele

Despite a humorous name, Ukulele is an integral part of Tsunami, serving as a gateway between a blockchain and Tsunami DB, responsible for:

  • monitoring new blocks
  • keeping track of chain reorganizations
  • ensuring data consistency

As a genuinely time-critical component, Ukulele is coded in Rust and forms a significant part of Tsunami API. There is a Ukulele version for each blockchain platform, each with distinct features tailored to the platform’s requirements. Now let’s delve deeper into its responsibilities.

Monitoring new blocks

We must be promptly notified when a new block is added to the blockchain. Ukulele maintains a connection to every node in the pool and monitors for recent block events. Due to the wide distribution of nodes, we ensure that Ukulele acknowledges new blocks as soon as they are propagated to any of our nodes. This is how Tsunami API remains up-to-date with the blockchain, minimizing the lag between the blockchain and our databases.

Keeping track of chain reorganizations (“reorgs”)

For us, keeping track of reorgs is a matter of the utmost importance. In a reorg, certain transactions may get dropped out of the blockchain, which can:

  • impact the accuracy of our Tsunami outputs, and we can’t afford to have false data in our responses
  • make any calculation incorrect; for instance, when you calculate token balances, you need to know only the remaining transactions on-chain

In summary, when a reorg occurs, Ukulele deletes all irrelevant data from Tsunami DB, informing all other system components that certain blocks have been reorged and a recalculation is required, e.g., in every data lake.

Ensuring data consistency

As a non-decentralized system (for now), it is imperative that our clients can trust the accuracy of our data to the fullest extent possible. For that purpose, Ukulele has a consistency check mechanism that compares block traces from various nodes and their implementations. On a rare occasion of data inconsistency between nodes, we handle each case individually to ensure data integrity around the clock.

Tsunami DB

Tsunami DB is a robust relational database that stores all indexed data. The underlying technology is classical PostgreSQL, a truly bullet-proof veteran of databases. However, merely having a database with terabytes of data and billions of entities wouldn’t suffice. Queries to this kind of database would take minutes to extract data, whereas we aim to provide responses within a second. To achieve such an ambitious goal, the solution must be much more sophisticated than just having a database. For each supported blockchain platform, Tsunami DB has two to three geographically-separated clusters of servers that hold and serve the data.

(every figure represents a physical machine)

Clusters don’t utilize any form of replication; on the contrary, they are 100% autonomous. This means that in the event of a disaster, a switchover is unnecessary, and we can continue to fulfill users’ requests from various clusters.

The API will maintain open connections to all clusters and perform health checks every second. The data is always sourced from the most recent cluster. Each cluster has its own Ukulele, which ensures everything is in order.

What happens when a user requests data from Tsunami API?

Imagine a user who queries Tsunami API to get Uniswap’s pool events from block 0 to the latest. What happens, then? The API will decide where to route your query based on the cluster’s latest block.

After determining the cluster, the query begins extracting data. It will simultaneously query all available table shards, typically 32 to 128, to retrieve results the fastest way possible. Once all the shards have been queried, the machine accepting the query will sort the results and apply dataset limits. After this process is complete, the data is sent back to the API and served to the user who requested it in the first place.

The sharding system is designed to improve user experience by utilizing both vertical and horizontal techniques, rather than relying on node data traversal.

API

The API service acts as a superficial protective layer that shields users from the infrastructure through standard HTTP interfaces. It offers a range of methods for accessing data from Tsunami DB and safeguarding the infrastructure from unauthorized access, which prevents any changes to the DBs.

What comes next?

The question of “What comes next?” is a tricky one for Tsunami. We have developed a solid foundational technology which enables the creation of additional products to be built upon it. Aside from the obvious increase in supported blockchain platforms, we have a few options to consider:

  • improving data access tools, bringing more interfaces than just RESTFul API (e.g., GraphQL) to suit more use cases and reduce data and network overheads
  • integrating non-EVM chains into the ecosystem — challenging, but inevitable
  • enriching Web3 data by including and mixing Web2 data — but this job is for Data Lakes, not Tsunami

In addition, we have several features in our backlog that will cater to a wider range of use cases. These include mass data exports that are suitable for AI and machine learning applications.

Bottom Line

As we approach the conclusion of this article, I would like to recap and emphasize the differentiating features of Tsunami API. Its unique four-component setup guarantees fast, high-quality, and consistent performance, and includes but not limited to:

  • our own node forks with additional tracing software for achieving millisecond updates
  • the multiple functions of the Ukelele system to monitor, track, and retain consistent data
  • geographically dispersed Tsunami DB, which is 100% autonomous and has contingency plans in place
  • highly secured API, which provides excellent user access and experience while minimizing the risk of database contamination

PARSIQ’s Tsunami API enables users to develop a variety of Web3 data products on top of it, one of which is our very own creation: Data Lakes. And this is just the beginning, as in the future, even other data providers can be built atop our APIs, providing a unified interface to Web3 data.

Considering the current bear market, today’s crypto world is full of numerous data providers. Some are still in development, some are in the early stages, and some may be a failure. Nevertheless, Tsunami API remains an ultimate weapon, providing access to all data the blockchain has to offer. 🚀

In case you missed last week’s news — we just released our SDK! With PARSIQ SDK, you can collect, organize, and interpret data from distributed ledgers, transforming raw information into actionable intelligence. Read more below.

--

--