Key Characteristics of Distributed System : System Design 🖥

Published in

rtkal

5 min readMay 17, 2020

A distributed system is a system whose components are located on different machines or networks, which communicate and coordinate their actions by passing messages to one another and the components interacts with one another in order to achieve a common goal

A distributed system in its most simplest definition is a group of computers working together as to appear as a single computer to the end-user. 👨‍💻
Examples - banking systems, massive multiplayer online games, and sensor networks.

1. Concurrency

There is a possibility that several clients will attempt to access a shared resource at the same time. Concurrency control is the procedure in Database like DBMS for managing simultaneous (done at the same time) operations without conflicting with each another.
Example → Rajesh and his friend went to electronic kiosks at the same time to buy a movie ticket for the same movie and the same show time. However, there is only one seat left in for the movie show in that particular theatre🎦.
Without concurrency control, it is possible that both moviegoers will end up purchasing a ticket. However, concurrency control method does not allow this to happen. Both moviegoers can still access information written in the movie seating database. But concurrency control only provides a ticket to the buyer who has completed the transaction process first.

2. Scalability

Scalability is the capacity of a system to handle a growing amount of work by adding resources to the system
In other words, It is the ability of a computer application(hardware or software) to continue to function well when it (or its context) is changed in size or volume in order to meet a user need 👨‍💻

👉 Horizontal Scaling → This means you scale by adding more servers into your pool of resources. Example →MongoDB.

👉 Vertical Scaling → This means you scale by adding more power (CPU, RAM, Storage, etc.) to the existing server. Example →MySQL

3. Reliability

Reliability is the probability of a system or system element performing its intended function under stated conditions without failure for a given period.
👉 Distributed system is reliable if it keeps delivering its services even when one or several of its software or hardware components fails📉.
How ?🤔 → Because if any components fails that immediately will be replaced by another healthy components.
Example → A large electronic commerce store (Amazon), where one of the primary requirement is that any user transaction should never be canceled due to a failure of the machine. For instance, if a user has added an item to their shopping cart, the system is expected not to lose it. A reliable distributed system achieves this through redundancy of both the software components and data. If the server carrying the user’s shopping cart fails, another server that has the exact replica of the shopping cart should replace it.

4. Availability

Availability refers to the percentage of time that the infrastructure, system or a solution remains operational under normal circumstances(conditions) in order to serve its intended purpose.
Example → You brought a server for one year. There are 365 days in one year. Now after completion of the one year you see that your server was down and up in one year period, Let’s assume
Total downtime per year = 36.5 Days
Total uptime per year = 365–36.5 = 328.5 days
Availability = (Uptime/Uptime+Downtime)*100 = ((328.5)/(328.5+36.5))*100 = 90%.🙂

👉High Availability →If availability tends to 100% its called high availability. To do this, it contains backup components to which it automatically switches (fails over) when an active component stops working. The failover should appear seamless to users and should not interrupt services.
A aircraft✈️ that can be flown for many hours a month without much downtime can be said to have high availability. Because they always have extra Jet Engine available.😜

5. Efficiency

Efficiency is all about making the top-quality things as cheaply as possible like Speed, Quality and Cost. If we combine High Speed, High Quality and Minimum Cost refers to a Good Efficiency. 🙂

How do we estimate the efficiency of a distributed system? Assume an operation that runs in a distributed manner, and delivers a set of items as result. Two usual measures of its efficiency are the response time (or latency) that denotes the delay to obtain the first item, and the throughput (or bandwidth) which denotes the number of items delivered in a given period unit (e.g., a second). These measures are useful to qualify the practical behaviour of a system at an analytical level, expressed as a function of the network traffic. The two measures correspond to the following unit costs:

number of messages globally sent by the nodes of the system, regardless of the message size;
size of messages representing the volume of data exchanges.

The complexity of operations supported by distributed data structures (e.g., searching for a specific key in a distributed index) can be characterised as a function of one of these cost units.

Generally speaking, the analysis of a distributed structure in terms of ‘number of messages’ is over-simplistic. It ignores the impact of many aspects, including the network topology, the network load, and its variation, the possible heterogeneity of the software and hardware components involved in data processing and routing, etc. However, it is quite difficult to develop a precise cost model that would accurately take into account all these performance factors; therefore, we have to live with rough but robust estimates of the system behaviour.

6. Serviceability

Serviceability refers to the ease of recovering from (or preventing) failures; how effectively/efficiently the system can be kept running. It focuses on things like accessibility of hardware components and availability of replacement components.

7. Manageability

The ease of diagnosing and understanding problems when the occur, ease of making update or modification, and how simple the system is to operate.

In other words It refers to the ease with which the system can be monitored and maintained to keep it performing, secure, and running smoothly. Its focus is more directly on the system administrators than the users, providing the tools and controls to facilitate that job.

8. Fault Tolerance

Fault tolerance refers to the ability of a system (computer, network, cloud cluster, etc.) to continue operating without interruption when one or more of its components fail.

👉 An OS’s ability to recover and tolerate faults without failing can be handled by hardware, software, or a combined solution leveraging load balancers.

👉 Fault-tolerant systems use backup components that automatically take the place of failed components, ensuring no loss of service.

9. Consistency

In distributed systems, a consistency model is a contract between the system and the developer who uses it. A system is said to support a certain consistency model if operations on memory respect the rules defined by the model.
They are a powerful abstraction which helps to describe a system in terms of its observable properties.
Example →From a practical point of view, a consistency model answers to questions like: If Client 1 adds an object to a distributed data store at 1:00 pm and Client 2 tries to read the same object at 1:01 pm, will Client 2 be able to get the new object? Or will it get a 404 because the resource was not found?
The answer is that…it depends. There are many factors to consider to answer such a question. One of them is the consistency model adopted by the distributed data store.

=== I hope this article would help you for basic understating of key characteristics in a distributed system design patterns. ===