Nancy Lynch — Distributed Systems Pioneer
Today’s pioneer is the author of the Distributed Algorithms book, considered by many as the distributed systemss bible. For Morgan & Claypool Publishers, she’s the series editor of the Synthesis Lectures on Distributed Computing Theory. She’s an ACM Fellow and has won the Dijkstra Prize, twice. Let’s try to understand the impact her work has had both in academia and in the industry.
Back in the ‘70s as networks of computers started to be more common, and people tried to use those computers to solve tasks in groups, a new set of problems appeared in the computing industry: how to make those nodes agree on a value during a computation, for example: That a record should be deleted from a database and all its replicas. Today’s pioneer is a mathematician that worked in producing proofs, more specifically impossibility proofs that showed that certain systems couldn’t be built.
Why are there systems that cannot be built? When we have a distributed set of computers that are trying to cooperate, they need a way to share information, for example, by sending messages over a network. The various kinds of agreement protocols try to cope with different failure scenarios that could happen whenever a distributed system is in place. What happens if the network connection breaks? What happens if a node is slow to respond? What if the node doesn’t respond at all?
Distributed Systems algorithms try to answer these questions by stating things like, in a process group, one process will be the leader that will coordinate the other processes. Another algorithm could say, after certain timeout, a node will be considered dead. If a node crashes, we will remove it from the group, and we’ll add another node that will start from scratch. Once up, it will ask its peers about the current state of the system, and so on.
Each of these algorithms requires a certain number of messages to reach consensus, or certain amount of rounds, and so on. What Lynch and colleagues tried to prove is what are minimum amount of messages that certain algorithm requires, or how many rounds an algorithm needs in order to succeed. They call these kind of results Impossibility Results.
What Good Are Impossibility Results?
Most obviously, impossibility results tell you when you should stop trying to devise or improve an algorithm.
Still there are developers, that when told that the system they are trying to create is unfeasible, they will keep trying to build it nonetheless. Does this mean they are obstinate? Not at all says Lynch.
This shows that these developers are willing to reach compromises, and settle for a solution that might work with “sufficiently high probability”. Also, it brings honesty into their design:
So it seems that this impossibility result has had the beneficial effect of helping (or forcing) system designers to clarify their claims about their system.
This means you have to explain how is your system going to behave under the different kind of failures that could happen during the execution of an algortihm in a distributed system.
This information [impossibility proofs] can be useful both for theoretical research and for systems development work.
What Kind of Impossibility Proofs?
There’s many kinds of impossibility proofs, here some of them.
Number of processes: how many processes are required for an algorithm to reach agreement in case of failures. If we have 3t processes where t is the number of failing process showing Byzantine faults, then Byzantine agreement is impossible. Nancy Lynch wrote a paper with her colleagues called “Easy impossibility proofs for distributed consensus problems”, providing an elegant way of presenting that impossibility. This is just but one of the many results of this kind, depending on the properties of the system being studied.
Number of rounds: how many rounds a distributed consensus algorithm needs for reaching agreement. In this case she proved with Mike Fischer that t+1 was the lower bound on number of rounds requried for Byzantine agreement. Here again, t is the amount of failing processes.
Asynchronous Impossibility Results: Finally we reach her most famous paper, for which she received the Edsger W. Dijkstra Paper Prize in Distributed Computing. The famous FLP paper (for Fischer, Lynch and Paterson).
The paper is called Impossibility of Distributed Consensus with One Faulty Process.
In this paper, we show the surprising result that no completely asynchronous consensus protocol can tolerate even a single unannounced process death. We do not consider Byzantine failures, and we assumethat the message system is reliable–it delivers all messages correctly and exactly once. Nevertheless, even with these assumptions, the stopping of a single process at an inopportune time can cause any distributed commit protocol to fail to reach agreement.
Just consider the restrictions they imposed to the system: a reliable message system that delivers messages correctly and exactly once! Still if one node fails, the whole algorithm fails.
Thus, this important problem has no robust solution without further assumptions about the computing environment or still greater restrictions on the kind of failures to be tolerated!
Thanks to this paper, research in consensus algorithms took a new turn, since an intuition that was shared by many system designers, now was proven as an impossibility. This lead to advances in consensus algorithms that integrated techniques like failure detectors. Without FLP, people would still be trying to solve impossible problems!
In Nancy Lynch we see a person that strives for formal methods, for proofs, so people stop doing work that will lead nowhere, or at least, makes them reconsider their methodology. Her work is a masterclass on letting people know what problems might be worthy of research, and which ones simply have no solution.
Advent Calendar — Help us make it a book!
From December 1st until December 24th we plan to release one article each day, highlighting the life of one of the many women that have made today’s computing industry as amazing as it is: From early compilers to computer games, from chip design to distributed systems, we will revisit the lives of these pioneers.
Each article will come with an amazing illustration by @SebastianNavasF
If you want to see these series to become a book with expanded articles and even more illustrations by Sebastián, then subscribe to our newsletter below.
- Illustration: Sebastián Navas
- Fischer, Michael. Lynch, Nancy. Paterson, Michael. Impossibility of Distributed Consensus with One Faulty Process. ACM Journal, 1985.
- Lynch, Nancy. A Hundred Impossibility Proofs for Distributed Computing. ACM Journal, 1989.
- Wikipedia Profile: https://en.wikipedia.org/wiki/Nancy_Lynch