Foraging for the Fallacies of Distributed Computing (Part 1)

Vaidehi Joshi
Jun 21, 2019 · 10 min read
Image for post
Image for post
Foraging for the fallacies of distributed computing!

So much of computing is based on assumptions. We design systems operating on a set of assumptions. We write programs and applications assuming certain parts of their systems will work a certain way. And we also assume that some things can potentially go wrong, and we (hopefully) attempt to account for them.

One big issue with building computer-y things is that, even though we’re often dealing with complex systems, we aren’t always capable of reasoning about on a big-picture level. Distributed systems are certainly a great example of this (you knew where I was going with this, didn’t you?). Even a “simple” distributed system isn’t so simple, because by definition it involves more than one node, and the nodes in the system have to communicate and talk to one another through a network. Even in a small distributed systems with just two nodes, we make certain assumptions about how that system is going to work.

Here’s the thing about making assumptions, though: they can be wrong! As it turns out, there are some common, incorrect assumptions that we often make about distributed computing. In fact they are so ubiquitous that they have a name! I’m talking, of course, about the eight fallacies of distributed computing. And what are they? Well, it’s time to find out.

Assumptions gone wrong

James Gosling, who was a fellow at Sun Microsystems at the time — and would later go on to create the Java programming language — classified these first four fallacies as “The Fallacies of Networked Computing”. A bit later that decade, Peter Deutsch, another Fellow at Sun, added fallacies five, six, and seven. In 1997, Gosling added the eighth and final fallacy. So much of the work done to codify this list was inspired by work that was actually happening at Sun Microsystems at the time. And while Deutsch is the one who is often credited with having “created” the eight fallacies, clearly it was a combined, group effort (as is so often the case in computing!).

Alright, so now that we’ve got some history under our belt, let’s get into some of the practical aspects of this problem. Let’s first start with understanding what we’re talking about when we say a “fallacy” of distributed computing. What does that mean, exactly?

Image for post
Image for post
What exactly IS a fallacy when it comes to distributed computing?

Well, a fallacy is a kind of assumed fact or belief; however, even though it is assumed to be true, that is usually not the case in practice. A fallacy is just a belief that is a misconception. But what does that mean in the context of computing? Well, a fallacy made about a distributed system is an assumption about the system made by the developers, which often ends up being inaccurate and wrong in the long run. In other words, they are bad assumptions to make.

We can think of the eight fallacies as eight common misconceptions that developers of distributed systems often fall prey to, which we want to avoid.

Image for post
Image for post
The 8 fallacies of distributed computing

So, let’s take a look at these eight fallacies, shall we?

The eight fallacies of distributed computing are as follows:

  1. The network is reliable.
  2. Latency is zero.
  3. Bandwidth is infinite.
  4. The network is secure.
  5. Topology doesn’t change.
  6. There is one administrator.
  7. Transport cost is zero.
  8. The network is homogenous.

In this post, we’ll focus just on the first four fallacies, which we now know were the original “Fallacies of Networked Computing”! Conveniently, these first four fallacies are actually all anchored around one central, key character: the network. And, as we’re about to see, the network is a tricky little fellow. Of course, it’s kind of difficult to have a distributed system without a network so we have to get used to dealing with this tricky character.

Let’s learn about what makes the network hard to contend with, and why some of these fallacies are easy to assume.

The trouble with networks

This brings us to fallacy one: The network is reliable. Remember, these are fallacies that we’re dealing with, so the network being reliable is a true misconception; the network is not reliable!

Image for post
Image for post
Fallacy #1: The network is reliable (it is not).

So many things within a network can go wrong. Hardware can fail, the power can go out, the signal could get spotty, or the network could even become compromised! It might seem obvious that a network isn’t reliable, but as designers of a distributed system, we can sometimes forget this when we are writing code that is going to run on a distributed system. We write that code perhaps operating under this assumption, but when the network inevitably becomes unreliable at some point down the road, we realize that we made a bad assumption and it ends up affecting our system adversely. Thus, it becomes our job to keep this fallacy in mind and not forget that the network cannot be relied upon.

Now, before we get too much further into networks — not to mention the fallacies associated with them — there are some important terms for us to learn. When we talk about messages being sent through a network, we’re usually referring to data being sent across the network. But we can talk about data and the way it is sent in two different ways: latency and bandwidth.

Image for post
Image for post
Latency versus bandwidth

Latency is the measure of how delayed some data is to arrive, based on wherever it was sent. We can think of latency as the speed at which our data travels from one place to another, or how long it takes for the data to arrive from node A to node B in our system. Latency is a concept that many developers may already be quite familiar with, because we often refer to the latency of data in the context of milliseconds. For example, if it took a request 400 milliseconds to be served from a server (node A) to a client (node B), we can say that the latency of that data being sent was delayed by 400 milliseconds, or that it took 0.4 seconds for the message travel from the server to the client. Many performance-minded developers are often thinking of latency when they consider what to optimize or improve in their applications or systems.

Bandwidth, on the other hand, is a measure of how much data can be sent from one point to another in a certain period of time. We often talk about bandwidth in the context of internet speed (for example, megabits per second or Mbps) because we’re referring to how much data can be sent over our specific network, or the capacity of our network to send some amount of data in a given span of time.

While latency is the actual speed at which data gets to its destination via the network, bandwidth is the capacity of network to send a certain amount of data to its destination. Understanding the difference between these two concepts is important to wrapping our heads around the second and third fallacies, so let’s learn about those two next!

Some fallacies hurt more than others

Image for post
Image for post
Fallacy #2: Latency is zero (it is not).

This is perhaps an easy fallacy to fall prey to since, in our local environments or in the development environment, we may experience a much lower latency; in other words, we likely experience a short amount of delay time when it comes to sending or receiving data. However, that low local latency doesn’t accurately reflect the delay of what other parts of the system might feel — especially given different networks.

We’ve already learned about local-area networks (LAN) and wide-area networks (WAN). We know that in a WAN, data has to travel further from one node to another, since the network may span large geographical distances. This is in contrast to a LAN, which is a network with devices in the same building or room, and thus the data being sent around has much less of a distance to travel.

We cannot assume that our system will always operate on a local-area network (in fact, most distributed systems operate on a WAN), and thus we know that our data has a physical distance to travel. By proxy, we should not assume that there will be no “delay” or zero latency between some data being sent and that data being received. For example, a request that has to travel across the world from Portland to Paris will have more latency than a request traveling from a machine in one room to another.

Some latency is a limitation of networks; we should be wise to not build a system that assumes that such a delay is nonexistent. We will certainly be bitten by such a false assumption in the future.

Image for post
Image for post
Fallacy #3: Bandwidth is infinite (it is not).

The third fallacy is similar to this false assumption as well: Bandwidth is infinite. Just as latency is not zero, the capacity of a network to send data is not unlimited. As computing technology has improved over the years — and certainly since the early 90’s when the Fallacies were coined — the bandwidth of our networks keeps getting better and better, and we’re able to send more data across our data in a given amount of time.

However, even with these improvements, networks still don’t have unlimited capacity and cannot send infinitely large amounts of data. When we design a distributed system, it would behoove us to not assume that any amount of data can be sent across the network. For example, we may have some user of our system that has a network with a limited bandwidth and a lower capacity; for that user’s experience, sending a large amount of data (such as large images or vendored, dependent files) would mean that we’d be sending far more data than their network was capable of sending over the network in a reasonable amount of time.

Finally, we come to fallacy four, which is personally just my favorite one: The network is secure. There are just simple so many ways that a network can be attacked or compromised, from unencrypted messages to vulnerabilities in dependencies to open source code, or even just bugs in third-party softwares that our system might depend on (not to mention bugs in our own systems, too!).

Image for post
Image for post
Fallacy #4: The network is secure (it DEFINITELY is not!).

The misconception that the network is “secure” is a hard one to overcome, since, if we really start to think about it, no network is really secure. Indeed, the only way to truly protect our networks and be 100% secure is to just not connect to the network! Of course, because we are dealing with distributed systems…this isn’t really easy to do, since we know that a network is required for communication between nodes in our system.

So, what are we to do? Well, we can try to remember that true security is a fallacy, and that we can just do what is in our power to try to prevent breaches and attacks when we design, build, and test our systems. Hopefully, at least that’ll keep the security issues at bay for awhile.

In part two of this post, we’ll look at the remaining four fallacies. I promise that the sad little blue network blob will make a return appearance! We’re not done with it just yet.

Resources

  1. Fallacies of Distributed Computing Explained, Arnon Rotem-Gal-Oz
  2. Understanding the 8 fallacies of Distributed Systems, Victor Chircu
  3. Debunking the 8 Fallacies of Distributed Systems, Ramil Alfonso
  4. Fallacies of Distributed Systems, Udi Dahan
  5. The Fallacies of Distributed Computing Reborn: The Cloud Era, Brian Doll
  6. Deutsch’s Fallacies, 10 Years After, Ingrid Van Den Hoogen

baseds

Exploring the basics of distributed systems, every…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store