Solana Congestion: Explained
What is the lifecycle of transactions on Solana?
Sending transactions on Solana is very different from on EVM (Ethereum Virtual Machine) systems. When a user attempts to send a transaction to the Solana network, they can choose two paths — either send the transaction directly (TPU client) to the current block producer or send it to an RPC (1) that propagates it to the block producer. In reality, users only send via public RPCs (Helius/Triton).
Most transactions follow the latter and are sent to an RPC node that attempts to forward it to block producers. Solana’s leader (i.e. block producer) schedule is known in advance of every epoch (2 days), which allows an RPC node to broadcast user transactions directly to the current and next leader for inclusion.
Given Solana’s transaction processing structure, a user relies on the block producer to validate and commit the transaction. Therefore, the user submits it and awaits confirmation that it has been executed in a block. If the attempted transaction does not get included within a specific amount of time, the user is alerted that it has expired.
Predetermined time intervals, called slots, last between 400ms and 600ms. During this process, a validator selects transactions to process by packaging them into a block and proposing this block to the network. As each block is finalized, a final hash (timestamp) is recorded. Each transaction submitted to the network must include at least one recent blockhash. This is used as a timestamp, otherwise known as: Solana’s secret sauce to its performance.
When processing a transaction, validators will check if the recent blockhash is recorded within the most recent 151 last stored hashes. This is referred to as the max processing age. If the transaction’s blockhash or timestamp is older than the age, it is considered expired. The validity of a transaction is actually only about 60–90 seconds.
If a transaction is successfully included in a proposed block, the user receives a confirmation. The RPC server confirms and can verify it in the chain explorer of the user’s choice.
How do validators decide which transactions to choose?
Unlike other blockchains, Solana transactions are continuously streamed to a validator for execution. The validator continuously streams portions of a block, called shreds, to the network. Since there is no mempool that external builders or validators pick from, transactions are queued for processing on arrival by a scheduler. This is an algorithm that determines how transactions are processed by the validator. Most validators utilize the default scheduler algorithm that is included in the forked Agave client.
The default scheduler implementation has a multi-threaded system, each thread maintaining a queue of transactions waiting for execution. The current scheduler uses six threads, as four are for non-vote and two are for voting. Incoming transactions are randomly assigned to a thread’s queue. Each of these queues is ordered by priority fee and its arrival time.
This is where the “jitter” comes in. Jitter is used in Solana to explain a level of randomness for included transactions in blocks. Since transactions are assigned randomly to one of the four threads, maintaining its own queue of transactions waiting for execution, the priority fee is low in quantity. Since the fee only affects transaction placement within randomly assigned queues, low priority fee transactions are included before high priority fee transactions. Essentially, priority fees help with transaction order not inter-thread.
What role do fees play?
For the purpose of this article, we will skip a discussion about the inefficiencies of base and priority fees on Solana. To synthesize, but to keep it brief, the base fee on Solana is arbitrarily set at 5,000 lamports or 0.000005 SOL. This is around $0.0007 at current prices. Similar to EVM, senders can include a priority fee to help prioritize their transaction intra-thread.
So what about the congestion?
The congestion on Solana can be attributed to three overlapping issues:
Root cause #1
Solana’s extremely cheap transaction price can be a double-edged sword. If a user wants to increase chances of a transaction landing in a block, spamming the network repeatedly to increase the chance of landing it in a thread with the lowest queue is incentivized. However, having a transaction landing in front of the queue comes with a high priority fee.
This is the main cause of Solana’s recent congestion. Thousands of transactions, propagated by spam bots, clog the block producers and cause delays that invalidate transaction blockhashes, which in turn means invalidating and dropping transactions.
While the current setup allows block producers to forward messages to the next validator in line, there are limits on the amount of transactions that can be forwarded. Transactions can only be pushed to the next block producer in line who can push it forward. This effectively spreads the load or allows the network to self organize in the face of accelerated demand.
Root cause #2
Two major catalysts pushed demand for Solana blockspace to go to new heights. Firstly, the popularity of memecoins on Solana skyrocketed with the price action of $BONK which has increased 252x since October 2023 to March 2024. As their prices exploded, copycat memecoins launched thanks to launchpads such as pump.fun. This makes token launches incredibly easy. Twitter influencers and even some VCs joined in with their own tokens. On average, Solana saw 6,000 new SPL token launches per day in March.
At the same time, Solana’s protocol usage spiked. In particular, a program called Ore launched with a significant demand for blockspace. Ore has filled 25.35% of Solana’s TPS since its launch in early April. A new central limit order book or CLOB, Phoenix accounts for 14% of blockspace over the observed congestion period. However, the Metaplex’s program Bubblegum, claimed around 3.7%.
Regarding congestion, what is typically referred to is the fail rate for non-voting transactions, which is visually decoded in the following chart.
Empirically, the transactions included in a block but not finalized on Solana are currently about 50%. This generally reflects bot arbitrage attempts that do not meet the correct criteria when getting included in a block. In the period from March 2024 through April 2024, the fail rate clearly increased from 75% to 80%. That significant rise affects user experiences as transaction failures become more routine.
The chart above only refers to part of the congestion story, as it compares valid transactions that failed to be included. Failure to be included in a block is due to slippage or other limits. But, this excludes all the transactions that were dropped due to expired blockhashes.
Root cause #3
Solana’s transaction fees are currently not optimized. This only exacerbates the congestion issue. Many protocols do not implement priority fees, which leads to failed transactions on the network. In addition, many programs do not optimize the compute unit usage. Solana blocks have a cap of 48 million CUs in one block and reduce the number used by programs. This increases the amount of transactions that can land on the network.
But I heard discussions about QUIC being the primary cause for congestion.
Well, yes and no. QUIC is a protocol developed by Google. It is the transport layer currently used by Solana. Technically, it’s a wrapper built on top of UDP, a base transport layer that does not have congestion control. Think of QUIC as a set of rules on how data is sent between devices on Solana. This is how they understand and communicate with each other. QUIC allows block leaders to cut user connections or rate limit them based on specific criteria. As such, block leaders can cut them off during high demand, limiting spam, and preventing shut down.
The protocol was implemented on top of UDP after the NFT craze and DDoS attacks on the Solana network in 2021 and 2022. This caused the network to halt multiple times as validating nodes were unable to process the transaction and shut down.
The benefit of QUIC is its speed, resiliency, and security over TCP and UDP. Also, it has two base transport layers (each with some tradeoffs, detailed in the diagram above) but adds another layer of complexity as the validating nodes must be overseen in addition to processing. The end result is that the network can get congested to a level that is unusable. From a UX perspective, this should not be halted. One of the key issues is also that instead of the leaders being able to specify how and which connections are dropped, it is done randomly, making it difficult to limit spam.
I heard about stake-weighted quality of service in terms of QUIC — what’s that all about?
QUIC is the transport layer to the current block producer which acts as a leader. To reduce spam and low quality data transfer, some connections to the leader are prioritized over others. The priority is set by the validator stake. The higher the stake, the more trust in the validator which ensures better quality of service. The logic is called stake-Weighted QOS. These QUIC connections receive 80% of the available leader’s processing space while 20% is left for others.
What happens on the network is that RPC providers that are not operating validators enter into agreements. Affiliated with heavy stake validators, these are phantom allocated portions given to the RPCs to take advantage of. Then, these are delivered to the leader for block inclusion.
When can I use Solana again?
On April 15th, the Anza team (previously Solana Labs) released a patch v1.17.31 on mainnet beta. This patch is the first in a series of improvements aimed at stabilizing the network and easing congestion. It also improves the way block leaders are able to recognize high-priority transactions or connections. This reduces the ability of attackers to “spam” the network which causes widespread transaction drops. In addition, low-staked connections are now treated as unstaked. In the QOS, this allows a great priority for trustworthy RPC providers and regular users.
This patch, along with the short-term market sentiment, has already helped alleviate most of the problem. However, it does not solve the fundamental issues. In particular, optimization of transaction fees, the scheduling algorithm, and more network control is needed. Minimizing jitter in blocks is key for users to understand what is required to successfully land a transaction in order to make a choice instead of being forced to spam to increase the chance of block inclusion.
The most anticipated upgrade coming soon is improving the fee mechanism in the scheduler algorithm. In the upgraded scheduler, there will be smoother transactions split among the threads with a deterministic approach to landing transactions, reducing spam. In the meantime, Solana-based companies started implementing their own upgrades to alleviate congestion. Drip.Haus introduced a new update that allows users to choose which NFTs are put on-chain and which aren’t. Kamino Finance, along with other DeFi protocols, focused their efforts on ensuring the function of their liquidation engines ran smoothly and optimized connections to improve transaction landing rates.
In the long term, a broader discussion around the economic viability of Solana will need to be addressed. Despite localized fee markets that allow for higher fees during heavy use, the current economic incentive models are subject to debate. But, these clearly lead to system inefficiencies that will continue to throttle the network until something is changed.
Therefore, major improvement is expected with the upcoming release of Firedancer, the new Solana client. It is developed completely independent of the current Solana.Agave client and is expected to alleviate congestion. Hopefully, Firedancer will mark the beginning of a new increase of great magnitude in network performance.
//////
Notes:
\1\ RPC stands for Remote Procedure Call. In crypto, it refers to a technology that allows applications to communicate with a blockchain. RPC node acts as an interface between the blockchain network and its applications.
\2\ In reality, the recent blockhash can actually last longer than 151 slots. Slots may be skipped and age checks use “block height” not “slot height”. Since slots are skipped occasionally, the actual age of a blockhash can be a bit longer. Historically, the skip rate is about 5% so the expected lifetime of a transaction that uses the most recent blockhash is about 1min 19s based on the most recent data.
\3\ Public RPC nodes, by default, try to forward transactions to leaders every two seconds until either the transaction is finalized or the transaction’s blockhash expires. Given that blockhashes generally expire within 60 to 90 seconds, so each user has around 30 to 45 attempts to be included in a block prior to the blockhash expiring.
\4\ A good example of why the transaction fail rate spiked is the slippage limit associated with memecoins. As memecoins are extremely volatile assets and they are traded on multiple venues at once, it is possible for the memecoin price to trade outside of the slippage limit bandwidth before it gets processed by block producer. The transaction is processed and not dropped, but it fails as the trade condition is not met. There are more reasons for transaction failures but slippage is one of the keys.
Thank you to the Squads team and RockawayX team for the review and insightful comments.
Sources:
https://www.umbraresearch.xyz/writings/lifecycle-of-a-solana-transaction
https://www.binance.com/en/square/post/6477413118761
https://solana.com/docs/intro/transaction_fees#why-pay-transaction-fees
https://www.umbraresearch.xyz/writings/solana-fees-part-1 Solana Transaction Fee Mechanisms
https://mirror.xyz/eclipsemainnet.eth/GTrrYpmxSY1ubQ0SEV5akF7EZXgA2XFr7FEjM3CUuaE
https://medium.com/@Burgeonxyz/understanding-solana-leader-rotation-mechanism-99a6544725be
https://www.helius.dev/blog/all-you-need-to-know-about-solana-and-quic
https://coinfomania.com/solana-becomes-memecoin-chain/
https://www.youtube.com/watch?v=yeV2i8bfSMs
Solana Validator Education — Stake Weighted QoS
An educational workshop on setting up stake weighted quality of service. The setup requires an RPC server and a staked validator to make configuration changes in order to take advantaged of connections set
https://www.youtube.com/watch?v=yeV2i8bfSMs
https://cryptorank.io/news/feed/d909f-solana-finally-easing-network-congestion-with-latest-release