Design Review Checklist for Distributed Systems

Distributed system design made “slightly” simpler

Kislay Verma
The Startup

--

This article was originally published on my website — https://kislayverma.com/programming/design-review-checklist-for-distributed-systems/

Distributed system design is a hard problem, made all the worse because the design process gives no direct feedback. Problems stemming from faulty design often show up as scalability problems, resilience problems or data issues. However, solving those problems is often the equivalent of addressing the symptoms and not the disease — we may be able to patch up the system enough to keep it running, but the underlying design issues remain and can be triggered again under different circumstances. It takes a lot of effort, not to mention organizational wrangling, to be able to analyze design related root causes when the system is failing in production.

As with an earlier post on code reviews for distributed systems, this article is a simplistic checklist of things that I look out for when reviewing design of a distributed system functionality (anything which requires multiple systems to work together).

I think of distributed design issues in terms of three buckets : Consistency-vs-Availability, Domain Coupling, and Observability. The first two often leak into each other…

--

--