No, this is not really a post about the upcoming Designing for Scalability with Erlang/OTP. Erlang is nearly unique among other programming languages in that almost all of the books on it are in the good to great range. This book is going to be no exception.
No, this is not about Erlang per se, as other languages have the same problem. But Erlang is a poster child for “scalable distributed 99.999999% cloud developer-to-user ratio <insert the next current rave here>”.
Recently I twitted:
Every single book on #Erlang spends 99% of text on reiterating the same basic principles. Most of them are worse than LYSE
Where LYSE is, of course the excellent Learn You Some Erlang For Great Good.
This is a bit harsh. May be not 99%. And not necessarily worse. So, what do I complain about? Most books mostly concern themselves with “draw some circles”. Reality is quite often “draw the rest of the owl”. There is almost no info on the intermediate steps in between.
Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged in order to accommodate that growth.
To scale horizontally (or scale out) means to add more nodes to a system, such as adding a new computer to a distributed software application.
To scale vertically (or scale up) means to add resources to a single node in a system, typically involving the addition of CPUs or memory to a single computer.
So, in the case of Erlang, which is described as, praised for being, marketed as being scalable, distributed, durable, resilient etc. etc. it would be really nice to read up on at least some of these things:
- setting up multiple nodes
- testing a distributed app
- deploying a distributed app
- handling failover
- handling load balancing
- handling netsplits (and not only in Mnesia. If we can add a process on node B to a gen_supervisor on node A, how do we handle netsplits, timeouts, restarts etc.?)
- discovery of nodes
- tracing
- profiling
- various VM options and their impact
- securing connection between nodes
- logging
- debugging
- crash dumps
- remote inspection
- mitigating overflowing mailboxes
- SSL
- sockets
- working from behind firewalls
- flood protection
- slow requests
- timeouts
- sessions
- latency
- <add your own>
Funnily enough, in an out-of-the-box Erlang:
- there are zero answers to some of these questions, so it would be nice to find out how this can be solved (consul/etcd for node discovery? how? and similar questions)
- already has excellent tools, but very little info on best way to use them: profiling, tracing, remote inspection (redbug, do you use it? etc.)
- has performance problems with poorly documented workarounds or third-party solutions for some situations (running across thousands of nodes or thousands of cpus comes to mind)
- has support for some scenarios, but very little info on them (there’s a total of about 1000 words on large scale testing with Common Test in the docs for example)
Unfortunately, most existing books emphasize beyond measure only one single aspect of Erlang: OTP. Even though OTP is no doubt essential to creating robust scalable distributed applications, people looking for the answers to questions above already grok OTP :) They already know how to draw the circles.
If you’re looking for answers, though, the landscape in Erlang books is quite bleak.
Among the commercially available it looks like only “Mastering Erlang: Writing Real World Applications” doesn’t fall into the trap of spending all of it’s chapters on OTP with only one or two chapters dedicated to something else. (Update: I’m not even sure this book exists. It looks like it’s been “Coming soon” since 2010. Except perhaps here)
The other notable exception is the excellent “Erlang in Anger” which answers quite a lot of the questions above.
We’re quite ready to move beyond drawing circles. Some of us are not yet ready to draw the entire owl ;) Hjälp!