Stellar Load Testing Results for the Kin Ecosystem

The Kin Ecosystem needed a more predictable blockchain in terms of transaction confirmation time and fees, and Stellar was our winning candidate.

Now, we’re using Kik’s nine years of experience and knowledge in network engineering and scaling to test the Stellar public blockchain network for Kin. Our aim: To see if Stellar indeed stands up to its performance claims.

TLDR

Results are good.

Now that you’ve calmed down, let’s get into the details.

Required warm up meme for effect.

Requirements

First, a short introduction to load testing. According to Wikipedia:

Load testing is the process of putting demand on a software system or computing device and measuring its response. Load testing is performed to determine a system’s behavior under both normal and anticipated peak load conditions. It helps to identify the maximum operating capacity of an application as well as any bottlenecks and determine which element is causing degradation.

In our case, the system in question is the Stellar public network. We wanted to observe the network’s operating capacity and identify any possible bottlenecks in advance. In order to accomplish this, we decided to collect the following metrics:

  1. Median confirmation time: The amount of time it takes for consensus to be reached in the network for new ledgers (“blocks”), and if this metric is consistent over time under load.
  2. Operation count: Amount of operations (“transactions”) included in each ledger.
  3. Network responsiveness: Node response time to incoming transactions while receiving and handling other ones at the same time.

Terminology

Some of the above terminology might seem strange, and you might wonder what operation or ledger really mean. The terminology used by Stellar and its federation consensus model are somewhat different than the common terminology used in popular proof-of-work projects like Bitcoin and Ethereum. The following is a short explanation of these. Some of the definitions are taken straight from the official Stellar guide:

  1. Ledger: Stellar uses the term ledger to represent a state of the network at a given point in time. Thus, The last ledger is the current state, the genesis ledger is the first state in history, and closing a ledger means to apply a set of transactions to the current ledger. Thus, a ledger can be thought of as the equivalent of a block in Bitcoin terminology, and updating or closing a ledger as adding a block.
  2. Operation: An operation is an individual command that mutates the ledger. In Bitcoin or Ethereum you would call this a transaction.
  3. Transaction: Transactions are made up of a list of operations. This is what you transmit on the network. This practically means you can submit multiple ledger mutation actions in a single request to the network.
  4. Stellar Core: Stellar Core is responsible for communicating directly with and maintaining the Stellar peer-to-peer network. This can be seen as an equivalent to a Bitcoin or Ethereum node in their respective networks.
  5. Horizon: Stellar provides a web application that makes it much easier to interact with the Stellar network. It provides a simple interface that allows you to submit transactions to the network, check the status of accounts, subscribe to event streams, etc. This is a good solution for mobile clients, who can avoid implementing a Stellar Core node, and just submit an HTTP request to a remote node instead.

Network Performance Considerations

Stellar’s website states that a ledger closes on average every 5 seconds. This can also be verified “live” using the Stellar Dashboard. In addition, the protocol allows up to 50 transactions per ledger, and up to 100 operations per transaction. This means that up to 5000 operations can be included in a single ledger, which is 1000 operations, and 10 transactions per second on average.

The most common use in the network for Kin Ecosystem clients (e.g. mobile and web apps) is to transmit transactions containing a single operation — most likely a payment operation to or from a digital service. This makes aggregating operations impractical, limiting us to 10 operations per second on average for the current protocol version. However, Stellar are aware of this and have mentioned that this network limitation is arbitrary and configurable. There is ongoing discussion about modifying the cap from being threshold based to operation based. (See the following GitHub issues for more information: stellar-core #1030 and stellar-protocol #75).

The conclusion from the above is that we need to test the common scenario for the Kin Ecosystem — which is to see if indeed the Stellar network can handle processing a series of 10 transactions per second (with 1 operation each) consistently over a period of time.


Test Structure and Considerations

  1. Set up four load testing instances in different locations on the globe: East and West-Coast US, Central Europe, and Southeast Asia. This should average out difference in network performance when submitting transactions from different locations.
  2. Each instance controls 120 accounts with enough funds for each to generate tens of thousands of transactions over the course of the tests.
  3. Each instance runs Stellar Core and Horizon locally. Transactions are submitted from the load testing application to the local Stellar Core via an HTTP request to the local Horizon. In turn, Stellar Core transmits the transactions to the Stellar network. The core app operates in “watcher” mode i.e. only submit transactions and receive updates from the network, but do not participate in consensus or help other nodes to catch up with the ledger history. This decision was made in order to observe how other nodes in the network handle load, without relying on our own nodes for validation.
  4. Once a test starts, each instance uses its accounts to transmit transactions to the network at a rate of 2 transactions per second. Summing all instances together results in 8 transactions per second.
  5. This leaves enough space for 2 additional transactions (for a total of 10) to be transmitted by other users of the Stellar network at the same time. This was done in order to avoid consuming all available transaction space in the network for the duration of the tests.
  6. Each submitted transaction contains a single payment operation each — since this resembles Kin’s expected scenario as we described above.
  7. Each test would run for three hours, in order to verify the performance we measured is consistent over time. Furthermore, we conducted four tests through different hours of the day and on the course of two days.

80 Operations per Second Load Test

An additional load test run was done, having a different setup. instead of using 1 payment operation, we used 10 payment operations per transaction. This would result in a total of 80 payment operations transmitted per second on average for the duration of the test.

While this will not be our expected scenario in the near future, we were interested to see if the Stellar network can indeed handle this kind of load.

Transaction Scale

Over the course of three hours, each test submits 86,400 transactions being submitted to the Stellar public network. This results in 432,000 submitted transactions in the five test runs.

Metrics

The metrics we measured are as follows:

  1. Time difference between submitting a transaction and committing it to a ledger (adding it to a block): How fast a transaction was confirmed on the network.
  2. Time difference between both of the above and Horizon (client interface) response time: How fast the interface replied whether the transaction was confirmed or not.
  3. Transaction success rate: How many transactions out of the total submitted were successful and committed to a ledger, according to Horizon.

Measuring Techniques

The load testing app would output timestamped logs during transaction submission and horizon response, along with the transaction result. In addition, we scanned all ledgers (blocks) which included our transaction and measured their commit time i.e. when they were added to the blockchain.

This information was then parsed and processed into spreadsheet tables in order to generate percentile and time charts. These charts are the final test results given to you below.

Due to a difference in instance clocks between our test instances and other nodes participating in consensus in the network, we noticed time skews and had to time-shift our measured timestamps by about 2.5 seconds forward in order for them to be “aligned” with the ledger time.

Open Sourcing the Load Testing Application

The load testing application, log processing code, and the data used for plotting the charts is open sourced at github.com/kinfoundation/stellar-load-testing. For the benefit of the community, we will release followup posts with further implementation details.

Results

Transaction Submit vs. Ledger Commit Time Difference

Transaction Commit to Ledger Time

This chart shows how long it took for submitted transactions to be committed to a ledger, in milliseconds, by percentiles for several test runs. The results show the following:

  • 50% of transactions are committed to a ledger 3–5 seconds after submission.
  • 75% of transactions are committed after 6–7.5 seconds.
  • 95% of transactions are committed up to 9.5 seconds.

Since the average ledger close time is 5 seconds, this means that every transaction has a 95% chance to be added to the next ledger or the one right after. Pretty fast!

Note that the “heavier” 80 operations per second test had the same (good) performance.

Transaction Submit vs Horizon Response Time Difference

Horizon Response Time to Submitted Transactions

This percentiles chart shows how long it took for transactions submitted to Horizon to receive a response. Horizon is responsible for submitting a transaction to Stellar Core, wait for it to achieve consensus, cache this information, and return a response. Understandably, its operations are slightly longer than Stellar Core. The results show:

  • 50% of submitted transactions receive a response around 6.5–10 seconds.
  • 75% up to 12.5 seconds.
  • 95% up to 15 seconds.

Note that even if Horizon has not responded yet, the ledger commit time results above indicate there’s a good chance the transaction has already been committed to a ledger.

Transaction Success Rate (Horizon)

Transaction Success Rate (Horizon)

The success rate of the main four load tests was a near 100% success. Very impressive. Failed transactions succeeded after re-transmission.


Shoutout to the Stellar Development Team

A big appreciation is given to the Stellar development team.

We had plenty of technical and design questions during the load testing process, and the Stellar team proved to be an amazing group to work with. They were quick to respond and professional.

One event of note is that during the load tests, minor performance issues were discovered with Horizon which resulted in high response times than expected. The dev team was made aware of the problem, and quickly turned to deal with it, and a fix is on the way in the following weeks. You can follow its progress on Github — See horizon #316.

Conclusion

Load testing a whole network is a challenging mission, which involves many long and thorough design, coding, processing, and reporting tasks.

At the end of it all, the results show that Stellar was indeed the best candidate for the Kin Ecosystem. It meets our most important requirements — namely predictable fees, network stability, and short confirmation times. In addition, Stellar enjoys the support of a strong and responsive development team to improve performance over time and mitigate bugs as they pop up.

As mentioned above, we continue to release posts and keep you informed about our Stellar implementation.

Please feel free to ask questions here on Medium and on our Subreddit if you have any.