This post may be interesting for you if you like Julia, if you have a rough understanding of when you’d typically use its broadcasting feature, and if you want to learn about how I re-used that feature to get a significant performance optimization in a setting without axes and indices.

The math problem

A common maths challenge is to find out whether a certain polynomial can be written as a linear combination of some other polynomials. For example, is


a linear combination of these polynomials

x^2 + y
x*y + y^2

or not? In fact, it is, but it is not obvious how to find the specific…

It’s a well-known trope that if your code doesn’t run, it probably doesn’t work. This is particularly true of error handling, error recovery and failover code, and has lead to the development of infrastructure (like Netflix’s Simian army) in which operators make sure that their systems are constantly in failure mode, ensuring that these code paths and mechanisms are being exercised continuously. In addition, at the human level, it forces a “downstream” developer to deal with failure scenarios resiliently and gracefully.

Image for post
Image for post

In this post, we’ll show how we apply similar reasoning to the slightly different domain of (web) server tuning at We run our web servers on bare metal and, consequently, we need to decide how many processes to run on a single machine, how many requests each worker serves and how to control its memory use. In our postmortem process (modelled in part on Etsy’s) it turned out that instability in this configuration caused a few outages, including a user-facing one, and we set out to solve this for good.

The quirks of our system

Let’s have a look at the properties that define our system. Our web application servers listen for external requests on an nginx process, which forwards its requests over a unix socket to a Plack server managed by uWSGI. At server startup, the latter process loads code and some shared data, and then uses fork(2) to spawn processes that actually render http responses. …

When my coworker @tvdw mentioned, a while ago, that memory throughput was now the main bottleneck of his number cruncher, I remember understanding that as a relatively esoteric situation. Theoretically, I understand caching effects and that registers and CPU cache are much quicker to access than main memory, but this never had actual implications for me.

I recently got in a situation where a priority queue operation was my main bottleneck. Splitting out operations line-by-line, it turned out that array indexing was the slow part. …

When we introduce new features on our website, sometimes it’s not simply the behaviour of our users that changes. The behaviour of our own systems can change, too.

Image for post
Image for post
Photo by Peter Nguyen on Unsplash

For example: A new feature might improve conversion (changing a user’s behaviour) while also slowing down our site rendering (changing the behaviour of our systems). This becomes interesting when you realize that the second effect might influence the first — a rendering slowdown might decrease conversion, for instance.

Sometimes, the opposing effects that turn up in our results make for interesting investigation and developments. At, …


Timo Kluck

PhD student in mathematical physics at Utrecht University.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store