The technological advancement we see these days has made a lot of things possible and easy to do, everyone has access to a service in just few seconds. Having a system that is available and robust has long been a good challenge for engineers and researchers, a solution that represents and an answer to these questions and many others is distributed system. Beware, this is not something new, but we are seeing it more and more these days and it has become the gold standard for every big system.
What is a distributed system?
A distributed system is essentially a set of software and hardware components that coordinate with each other via messages on a network to perform a common goal. That goal could be complex in nature, such as processing big data or creating a simulation such as an MMO (massively multiplayer online game) or something more nuts and bolts, such as a sensor network for tsunamis or earthquakes that trigger alerts when certain events occur.
The key aspect in distributed systems is the interconnection between the components via some network. This relationship brings about a couple of interesting concepts that should be considered:
- Concurrency — Each member of the distributed system is doing some work simultaneously or in parallel with each other.
- No global clock — Time synchronization between the members is not that easy. Even within a network there are delays in messages that could affect the synchronization.
- Independent failures — Members of the distributed system may fail at one point or another while other portions of the distributed system could still be running.
Why distributed systems?
A distributed system is characterized by many criteria’s which actually represent somehow its benefits, from whom we will cite:
- Resource sharing
- Fault tolerance
A distributed system is based on divide and conquer principle, by splitting responsibilities and reducing scope we will get a good performance/price ratio, increased reliability, incremental growth and increased fault tolerance.
What are the requirements?
In order to build a distributed system we need to keep somethings in mind as everything has its cost, we can’t deny that distributed systems look like the next savior and the leading design style of the future but in order to master it and do it the right way, there are some requirements that must be met, in which i must cite:
- The network is reliable — As network will be heavily used to transmit data across different services, an unreliable network will just cause chaos.
- Latency is zero — A latency free network will enable the system to process requests in a record time which converges to traditional system performing under low load.
- Bandwidth is infinite — If local DBMS had bandwidth systems will eventually stop, same thing applies to distributed system in which services acts as source of data as responsibilities are segregated, the last thing we want is a bandwidth limit in a system that will need an unlimited data transfer.
- The network is secure — Applications that rely on network must be secure and distributed systems are one of them.
- Topology doesn’t change — We need to have a highly portable system, which can operate in any environment or at least handle changes or different configurations.
- Transport cost is zero — In order to obtain a reasonable performance/cost ratio and looking at the fact that information is circulating over the network, we need to keep this cost as much as possible.
- The network is homogeneous — Usually in small applications this is not a problem, but going up to complex systems, it is going to be an issue if standard and uniform formats and protocols aren’t used, this is practically due to the fact that complex systems, will use different languages, systems, devices…
What we must consider?
There are many types of distributed systems, each designed to meet specific business requirements. Based on the priority of requirements, the following considerations may change, but in general, most developers will need to keep these in mind when designing a distributed system:
- Availability — Operational characteristic of a system where it is always ready to handle requests.
- Scalability — The ability of a system to increase its capacity to handle more load. This could be achieved by adding more servers to a cluster.
- Performance — Speed at which a system is able to handle requests.
- Cost — Total cost of ownership of the system. This could include hardware, software, development, testing, hosting, and cloud infrastructure.
- Manageability — Maintenance, update, migration, scaling, and diagnostic should all be manageable.
- Reliability — Able to adapt to the load and respond properly during exceptional conditions.
- Heterogeneity — Ability to support a variety of devices and protocols.
- Fault Tolerance and Failure Management — Systems designed with an expectation of failure will make them more fault tolerant.
- Concurrency — When multiple parts work at the same time (a given for distributed systems).
- Migration and Load Balancing — Closely related to reliability, fault tolerance, and failure management.
- Security — Ensuring proper authorization and authentication between the users and components of the system is key ensuring the confidentiality and integrity of the data.
- Modularity — Many small subsystems and modules forming a larger system in a way that can be configured and reused in different ways.
Essential elements in your distributed system?
There are few elements that are essential in a distributed system, a complex system that will meet all the requirements we listed and few others isn’t simple to implement, but we’re not the first to do it, so there is a few things that we’ve learned from our pioneers and crazy engineers.
In this article i will recommend some tools that will enable you to kick start your distributed system, but further exploration is needed for production grade apps.
In order to successfully put in place your system you must embrace DevOps, briefly explained, with DevOps principles, you’ll make sure that you have a robust delivery and development pipelines with improved quality assurance and decreased time. Sounds very promising, hell yeah just read this article.
We talked about infrastructure, particularly i will mention a few tools that you might already know, the Combo (Docker/Kubernetes).
In order to guarantee, cross portability and isolation we need Docker and in order to orchestrate our docker containers we need K8s.
If you are new to orchestration and docker, just google it you’ll find a bunch of resources and awesome videos.
These tools will allow us not only to do that, but also to guarantee availability, scalability, load balancing and maintainability.
In order to setup your infrastructure, you’ll use: CI/CD, Linux, Docker Image Registry, Kubernetes, Ingress NGINX, LetsEncrypt…
Application Based Infrastructure
We are still behind, we need to setup few other things to meet the minimum requirements for kick starting a distributed system project and in order to do that we need:
- A message broker to exchange messages and commands between services for example (RabbitMq).
- An ACID compliant DBMS (or multiple) in order to manage data concurrently (SQL Server, MySQL, Neo4j…).
- An tracing tool to trace information circulating in the network across services, this will help us optimize and diagnose efficiently (e.g: Jaeger).
- A remote log server/sink to centralize log entries and trust me logging is very very important especially in distributed systems (e.g: Datalust Seq, MySQL).
- A caching service like REDIS.
- A time series database to store service metrics and in order to do this you can implement or use an existing library that will help achieve it (e.g: AppMetrics for ASP.NET Core) and for time series database you can use InfluxDB for example.
- A dashboard to visualize cluster and service metrics, you can use Grafana as it is easy to use and customize.
Architectural/Design Patterns to consider
You can also explore more patterns for different levels at Microservices.io.
If you are willing to implement a distributed system, make sure it is worth it, because its not going to be the best choice for everything, it has some drawbacks with small systems and sometimes developing it can be painful if you don’t pay attentions to the details. Study your use case wisely then choose!