The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors

Published in

Frankly speaking

4 min readJan 9, 2019

This is part of a week(-ish) blog series where I deep-dive on a cool technology. I am an investor at Dell Technologies Capital in Silicon Valley, and occasionally reminisce about my previous life in academia. Follow me on Twitter and LinkedIn.

This week, I won’t be talking about security but instead about building scalable software. I really enjoy this work by a former labmate, Austin Clements, who now works at Google on Golang because it identified an important guiding rule for designing scalable software. Also, it allows developers to figure out if a piece of software can be scalable a priori.

Problem

Software must be increasing parallel to keep up with hardware, but scaling with parallelism is hard.

Kernel scalability is important, and many applications depend on the OS kernel. If the kernel doesn’t scale, then many applications won’t scale. However, this is hard because the number of kernel threads has to be greater than the total number of application threads. Also, the workloads tend to be diverse and unknown.

Below is a picture of the previous approaches to scalable software. It’s feedback loop that runs a workload, then plots the scalability. Then, it creates a differential profile to find and fix the top bottleneck. This process continues until the software scales.

This approach has success in practice because it focuses developer effort, but it has huge disadvantages. First, it requires huge amounts of effort. Second, new workloads expose new bottlenecks. More cores expose new bottlenecks, and finally, the real bottlenecks might be in interface design. This work focuses on a systematic way to find and resolve bottlenecks in interface design.

For example, there is a lock the prevents parallel creat calls because creat always needs to return the lowest available file descriptor. The solution to this is to change the interface and output any available file descriptor.

Solution: Scalable commutativity rule

Whenever interface operations commute, they can be implemented in a way that scales.

The rule enables reasoning about scalability throughout the software design process.

A single contended cache line can wreck scalability. Below, it shows that a single contended cache line messes up scalability for Exim.

So, what is scalable on today’s multicores? We say that two or more operations are scalable if they are conflict-free.

The intuition behind the rule is that operations commute if

results are independent of order
communication is unnecessary
without communication, no conflicts

Take the following reference counter example.

In this case, R2 doesn’t commute, but consider a modified R2 where it returns “ok” instead of the number.

As part of the work, Austin created Commuter, which given an interface specification and implementation, it can output all scalability bottlenecks. For more information on this tool, I encourage you to read his paper and check out the Github repo.

Does it work?

Commuter was able to find non-scalable cases in Linux. Below is a more detailed picture of why certain interfaces are not scalable.

To conclude, whenever interface operations commute, they can be implemented in a way that scales. This rule helps developers figure out whether scalability is possible or not before requiring any effort. This can save unnecessary developer work and help focus efforts on scalability tasks that are actually possible.

If you have questions, comments, future topic suggestions, or just want to say hi, please send me a note at frank.y.wang@dell.com.

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors

Written by Frank Wang