Matching Engine — TPS and Latency metrics from the INCX lab
As we promised in our AMA session we are writing this blog to give updates on our matching engine performance.
Matching engine is the heart of any exchange. It is the speed with which matching engine can match orders that determines the overall throughput and latency of entire exchange. More often than not matching engine performance is what quoted to indicate the speed of any exchange. Apart from speed, matching engine must also provide fairness guarantees. Lets first start reviewing what fairness guarantees does INCX matching engine provide.
Like most of exchanges INCX matching engine provides guarantees that the orders are sorted by price and time.
- For buy (bid) side, orders are sorted in descending order of price. Order with highest bid price is first and lowest bid price is last. If price is same then ordering is by time. The order that arrived earlier gets preference.
- For sell (ask) side, orders are sorted in ascending order of price. Order with lowest ask price is first and highest ask price is last. If price is same then ordering is by time. The order that arrived earlier gets preference.
We have done exhaustive testing of our matching engine to ensure that fairness is guaranteed.
Matching engine by nature is single threaded, especially the one that has to provide fairness guarantees. Matching of one order has to complete before next one can be picked up else fairness cannot be guaranteed. There are however certain academic papers that talk about concurrent matching engine. In order to pick the best possible algorithm and data structure for our matching engine we decided to experiment with multiple algorithms ranging from single threaded to multi threaded to very specialized data structure based implementations. In this post we will talk about the data sets and the setups we used to run the performance test and evaluate relative performance and strengths of three implementations. We have tried 6 different implementations. At this time we do not intend to open source our matching engine algorithm, so we will not spend a lot of time on implementation details.
Hardware: 2.2 GHz core i7
- Use equal number Bid / Ask orders
- Use 10% as the spread between all the orders.
- Order volumes vary from 100,000 orders to 10 Million orders.
Performance Results based on four variations.
1. Single threaded / Non-concurrent matching engine (Sequential GoLang): This implementation was in Golang and the idea was to leverage speed and performance of Golang and keep the engine simple and efficient.
Result: 800,000 Transactions Per Second (TPS).
Average Latency: 1.25 Microseconds per order
2. Single threaded / Non-concurrent matching engine (Sequential Java): This implementation was in Java and we admit that we were pleasantly surprised that we got better performance with Java. It is a testament to the maturity of the data structures in Java.
Result: 1.3 Million Transactions Per Second (TPS).
Average Latency: 0.8 Microseconds per order
3. Multi threaded / Concurrent matching engine (Lock-free): We wanted to capture the data collected based on this implementation because of the sheer amount of work that went into it. The idea was to squeeze as much parallelization that we can get from the hardware but at the end of the day, we had to maintain the fairness and correctness guarantees and the performance actually suffered.
Result: 500,000 TPS.
Average Latency: 2 Microseconds per order
4. Single threaded / Non-concurrent matching engine with specialized data structure: Based on the results from implementation #2, we spent most of our time on single threaded engines while trying out different data structures. This particular test gave us the best numbers and hence it is captured here. The engine itself is simple, it guarantees fairness and is actually easier to maintain. The data structures used is a special buffer which really reduces the latency it takes an order to go through the matching engine logic.
Result: 2.4 Million Transactions Per Second (TPS)
Average Latency: 0.4 Microseconds per order
A graph with the above results:
For java based matching engine you will note that the performance at lower load is worse than performance at higher loads. This is because JVM takes time to warm up and there are some initial setup costs that are incurred at the beginning.
It is important to note that this is performance of single matching engine. Our architecture is sharded, meaning we will have separate matching engine for each crypto pair. This means our overall performance in terms of transaction per seconds will be cumulative of all matching engines we run based on how many crypto pairs we support.
So if we support 50 crypto pairs, our through put will be over 100+ million transactions per second.
Addition of every new crypto pair will increase our throughput without affecting performance of existing crypto pair.
At INCX, we are laying the foundation for an ultrascalable, stable, simple, secure, and compliant crypto-exchange that trades at lightning speed. I think we are off to a great start!
Chief Architect, INCX
Telegram (Announcements Channel): https://t.me/INCXico
Telegram (Discussion Group): https://t.me/InternationalCryptoX