It’s Unhip to be N-Squared

Transactional Contention is $O(N²)

“I don’t even know what this is about” — Huey Lewis

Distributed programming environments tend to work with eventual consistency; most services give you a pretty good if slightly stale view of data, trading off consistency for speed / availability / efficiency. Consistency turns out to be expensive in a distributed environment.

But if you work on back-end systems, you’ll eventually find you have to count and/or update things in a strongly consistent manner.

Examples might include updating a leaderboard in a game when results are coming in concurrently and unpredictably. Or you might want to total up some amounts in a database which is also being written to unpredicably. Or there’s always the classic banking scenario: transfer $X from account Y to account Z.

Programmers tend to think in terms of critical sections in these situations. You want something like this:

doing stuff which doesn't need to be consistent
...
Begin Critical Section
# only one thread of execution can be in here at a time
Read some things
Do some calculations
Update some stuff
End Critical Section
...
doing more stuff which doesn't need to be consistent

Usually, the only tool for this in a database or distributed back-end system is a transaction, which looks like this:

doing stuff which doesn't need to be consistent
...
Begin Transaction
Read some things
Do some calculations
Update some stuff
Commit Transaction
...
doing more stuff which doesn't need to be consistent

This looks like the ticket!

But wait, the semantics are slightly different…

In a critical section, usually modelled with something like a semaphore, mutex, monitor, or the like, the idea is that many threads (or processes or etc) try to enter together. One gets in, others go in a queue. When the thread in the critical section completes, one from the queue is woken up and given the critical section, and so on until the queue is empty.

In a transaction, however, things work differently. Many threads of execution can enter (begin) the transaction and try to do things. These things don’t take affect until the transaction tries to commit. At that time, the system checks to see if any other transaction has committed while this one was underway (changing the data under our transaction). If not, our transaction can be committed; all actions take affect. Otherwise, the transaction rolls back; it fails, and no actions take affect. If a transaction rolls back, it can be retried until it succeeds.

These are superficially similar constructs. The most important way in which they are similar is that the protected data is only operated on by one thread of execution at a time, and that thread both reads and writes consistently. When the critical section’s queue is empty, or all transactions have been retried until success, all processing has been applied serially and successfully.

But the important way in which they are different is that, while a critical section implements a queue to store waiting threads/processes, a transaction lets many threads/processes go ahead, but then causes all but one of them to fail. The failed transactions do expensive work, and need to be retried.

If you’re working in a distributed system, and you’re thinking in terms of critical sections, you might be tempted to ignore this distinction. But you really shouldn’t. Under load (which is when you really care), the performance of critical sections and transactions is remarkably different.

Critical Section performance under load

Say you have N tasks to perform in a critical section, each of which takes one unit of time (henceforth one tick) to complete. What does the performance look like?

At tick 0 (zero), you run N tasks. One completes, the rest are enqueued.

At tick 1, 2, …, N-1, a task is dequeued, given the critical section, and run to completion.

At tick N, there is no more work to do.

We can see that this strategy:

  • Runs in O(N) time
  • Does O(N) units of work
  • Uses O(N) resources for the queue
  • If we are paying for processing, it costs $O(N)

If one unit of work (one transaction) costs you 1 cent, then this critical section approach costs you roughly $N/100.

If that’s the model you’ve got in your head for transactions, you’ll have a bad time.

Transaction performance under load

Let’s look at the same scenario for transactions. You have N tasks to perform in a transaction, each of which takes one tick to complete. What does the performance look like?

Let’s try a simulation for N = 100 transactions.

In the critical section example above, at 1 cent per unit of work, this was costing roughly $1.

I’ve built a simple transactional contention simulator which you can play with. Click the graphs below to go to the simulator and play with variables.

Here’s the result for N=100:

Transactional contention for N = 100, all tasks enqueued together.

The blue area is successfully completed transactions. The red area is the initial 99 tasks which don’t complete (are rolled back and must retry). The orange area is retried transactions which have rolled back and must be retried again.

This looks like a lot! Here’s some more data from the sim:

Total work is 5050 units, for 100 tasks! There’s also an efficiency calculation; how much of the work we do is actually useful? About 2% apparently.

In terms of money? We’ve done 5050 units of work. So at 1 cent per unit of work, that’s $50.50. Compared to the case of $1 for critical sections, that doesn’t seem good at all.

Let’s think about the general case for N transactions:

At tick 0, you run N tasks. One completes, the rest fail, to be retried in the next tick.

At tick 1, 2, …, N-1, previously failed tasks are retried. One runs to completion, the rest must be retried again.

A tick N, we have run N tasks successfully, there is no more work to do.

This strategy still runs in O(N) time. Which is good I guess.

It still uses some kind of queue, implicitly or explicitly, containing all the tasks to retry. This will require O(N) resources.

How much work does it do?

Tick 0: we do N units of work (1 commit, N-1 rollbacks)

Tick 1: we do N-1 units of work (1 commit, N-2 rollbacks)

Tick N-1: we do 1 unit of work (1 commit, 0 rollbacks)

In total, that’s N + N-1 + … + 1 units of work, or N(N-1)/2.

ie: We are doing O(N²) units of work.

So how much does this cost? Assuming we pay the same for each unit of work, we are spending $O(N²).

Yeah, laugh it up Bezos!

This is great for platform providers, but not so good for us.

Could we try varying some things to bring the cost down?

Spreading out your transactions

How about we try these transactions more slowly? What if we do 100 transactions, but try adding only 10 per tick?

Here’s the graph:

Transactional contention for N = 100, adding 10 at a time.

Still looks bad. Here are the totals:

Almost as bad as adding them all at once.

Alright, if we add them one at a time, they’ll get processed as quickly as we add them, and it’ll be like the critical section case, O(N) all the way. On the other hand, we’ll have to pre-serialise our workload, which kind of defeats the purpose.

How about adding 2 tasks at a time?

Transactional contention for N = 100, adding 2 at a time.

That looks a bit better, but still bad. It’s adding tasks two at a time for the first 50 ticks (adding two, processing one), then processing what’s left at one per tick for the second 50 ticks (adding zero, processing one).

If we go to the general case, this is roughly half the work of the case where we add them all at once. ie: kn²/4, or still O(n²).

The summary of all this is, that if you use transactions to manage real contention (where you are attempting transactions faster than you can clear them), you will end up doing O(N²) work and spending $O(N²).

I didn’t write this article for some theoretical fun. I hit this regime all the time. Generally, if you write a distributed algorithm which has a transaction-like chokepoint (which you can easily do accidentally) and you allow unbounded contention, you get O(N²) behaviour.

Limited retries, exponential backoff

In practical settings, we don’t do unlimited retries. Instead, we fail after some fixed amount of attempts.

But this isn’t very helpful. Instead of spending $(N²), you end up spending $(N*retries), which is still a big number, AND your system fails (permanent transaction failures are never pretty). This behaviour tends to look like your system going away for a long period of time, resources scaling out crazily, your logs filling up with crap and failures, and your bills getting really big. Not fun.

Exponential backoff doesn’t help much either; I think it’s more helpful for the platform provider than the customer. The graphs above tend to spread out in time a bit more, and be a bit more concave, but not enough to help. It’ll make your algorithm take longer, and most likely you’ll still get permafails.

Both of these mechanisms help cushion blowouts, but don’t do anything to fix a poorly designed algorithm.

You can still use transactions

You can use transactions. They still have fundamental properties which you will absolutely need. And, you can use O(N²) algorithms.

But, you must keep N very small. Small like less than 10.

N < 10? Just use ‘em.

Here’s the situation for adding all transactions immediately, for some small values of N:

N = 3, efficency = 0.5
N = 5, efficiency = 0.33
N = 7, efficiency = 0.25

For small N, the level of waste is fairly tolerable in return for consistency where you need it.

Transactional Contention is a sometimes food

But you really have to be careful here. N needs to be constrained to always be small. If your N is “always less than 7”, say, you’ll go fine. If it is “usually less than 5, but might occasionally get much larger”, this wont be so good. You’re going to find those “occasionally” cases cause cost blowouts and failures. And at scale, “occasionally” tends to become “almost always, somewhere in your system”.

How do we constrain N to small numbers?

You’ll find that most algorithms for scaling systems are really trying to constrain N to a small constant. They have some transaction-like thing (a contested resource) with transaction-like semantics, or worse. An example of worse-than-transactions is where you overload a machine, and cause it to crash; if you’re lucky, your resilient system will kill and restart machines for you, but the cost/efficiency/speed of your system is going to make you weep.

One of the best techniques for constraining N is sharding.

Sharding is where we divide up a keyspace into smaller pieces (shards), and only require transaction coordination within each shard.

If we use adaptive sharding (or recursive sharding),

then we can split an unconstrained problem size down to pieces with constant small N, using a hierarchy to fan out over the whole space, then fan back in, combining results up to a single result.

In Summary

Transactions look like critical sections, and we can be tempted to use them as a panacea for contention issues. But they will spend your money at $O(N²), which is bad. You can and should use them, but you must constrain N to very small numbers, and to do this you’ll need sophisticated algorithms.

In future articles I’ll present some tools and techniques for doing this in Python App Engine apps, and keeping your code simple and tractable in the process.