Developer Diary — Limit Order Book Saga

Andrew | Stack Finance
Stack Finance
Published in
4 min readSep 25, 2023
Credit: r/algotrading

I naively assumed building a high performance limit order book would be straightforward. The textbook algorithms seemed simple enough:

  • Add order in O(log n) time by inserting into a sorted tree
  • Cancel in O(1) with a hash lookup based on unique order ID
  • Execute in O(1) by removing from the inside of the tree

An orderly world on paper, soon to be shattered by the ruthless chaos of production reality.

In theory, assigning each order a unique ID made lookups and cancellations instant. But in practice, with hundreds of thousands of orders flooding the book each second, finding the right ID among a haystack of in-flight requests quickly became very O(n). What I wouldn’t give now for the simplicity of sequential IDs! Alas, the exchanges in their infinite wisdom have each devised their own convoluted identifier schemas optimized for uniqueness rather than orderliness. As an aggregator, I have little choice but to bend to the will of the exchanges. But that doesn’t make indexing their cryptically labeled firehoses of data any easier!

Executing orders also seemed efficient in simplistic simulations. But introduce concurrent threads accessing the book simultaneously and see how long that remains true. Race conditions care little that your single-threaded algorithm is O(1). The moment multiple threads start mutating the book concurrently, performance degrades rapidly. Lock contention alone could fill volumes. Clearly more work needed before I can check off the execution operation as complete.

The data volumes too — 20GB per day and 100k messages per second seemed impressive on paper. My humble prototype starts sweating bullets once the message rate passes 500/sec. “Just optimize it,” they always say breezily when I complain of performance woes. If only it were that easy! I am no optimization guru able to magically squeeze multiples more speed out of existing algorithms. This will take patience and many late nights with the debugger finding bottlenecks.

So here I remain, nursing a messy hairball of trees, arrays, maps and duct tape. It just barely hangs together in simplified tests, but quickly buckles under any real-world load. At this point, I should probably swallow my pride and just integrate a proprietary limit order book framework that handles these demands out of the box. But that would be the sensible thing to do, and I’m not quite ready to admit defeat yet!

No, I clearly enjoy pain and suffering, as evidenced by my insistence on building this from scratch. But each failure brings new learnings! I shall soldier on until this beast is capable of ingesting the firehoses of data today’s exchanges spew out 24/7.

On paper, limit order books are simple — efficient data structures and algorithms operating in tidy mathematical time bounds. But theory and practice collide violently when optimizations need to scale up by orders of magnitude. I naively thought this would be easy. The hard lessons of production have thoroughly humbled me. Yet I cannot quit now when I’ve come so far!

Yes, building this book has been the coding equivalent of Sisyphus repeatedly rolling his boulder up the hill, only to chase it down again each time it rolls backwards. But I’m slowly getting that boulder higher with each attempt! And someday, someday soon I hope, I will crest that hill and finally glimpse the promised land of a production-ready limit order book.

But while my algorithms seemed flawless on paper, reality has been unforgiving to their naive purity.

The indexed tree data structure, so performant in theory, buckles under the weight of real-world order volumes. My meticulously crafted hashes are reduced to wastefully probing empty buckets as IDs grow increasingly sparse. And don’t even get me started on the havoc concurrent threads have wreaked on operations assumed atomic in textbook land!

Yet still the market data torrents pour in, indifferent to the creaking Ruube Goldberg machine tasked with processing their relentless flow. No quarter given by the dreaded 1ms latency limit.

On paper, everything seemed simple — elegant algorithms and data structures dancing in harmonious time complexity bounds. But theory and practice collide violently when taken to production scale..

But from each setback comes new wisdom! And so I toil on, determined to transform this tangle of trees, locks, and duct tape into a system able to withstand the demands of real-world capital markets.

The exchanges continue to spew their cryptically labeled firehoses of data, optimized for uniqueness rather than order. Meanwhile orders flood in at rates that leave my indexing schemes swamped. Executions deadlock threads as parallelism wrecks havoc on textbook serial algorithms.

Yet still there is hope! Each failure brings new insight on bottlenecks. Tiny gains in efficiency accumulate like compounding interest. What is programming, if not determination in the face of countless obstacles?

The road stretches long ahead, but the Limit Order Book shall be built!

--

--