Designing Scalable Systems: Part 1

Ram Karnani
5 min readJan 1, 2019

--

How do you design a system?

Lets look at a simplified view of a payments company where the following requests could come:

System 1
  1. Money transfer requests come in and load balancer distributes these requests to different machines. Each machine executes the write transaction and commits it to our SQL db.
  2. User requests transaction reports, these requests come to the load balancer and then to machines and these machine execute read transaction on our SQL db to generate the transaction reports and serve them to the user.

This system should work without any glitches but is it scalable?

In a situation where the number of users and transactions keep on increasing, wouldn’t our SQL db become a bottleneck? How? Because there is an upper limit to how many concurrent read/write you can perform.

This simple yet powerful use case helps us to correct our initial question-

How would you design a scalable system?

  1. To conquer this problem of scalability and handle the ever increasing load, one of the solution could be Vertical Scaling i.e. using powerful computers with lots of RAM, CPUs etc. to handle the user load and work on the data. But again, there would be an upper limit to the performance you achieve using the hardware modifications.
  2. Another solution is to use the commodity hardware and perform Horizontal scaling i.e. work with Distributed Systems.

Restating: Our goal is to strive for scalability which can be made possible using Horizontal scaling i.e. using Distributed systems instead of a Centralized system.

A simple solution for our above stated problem would be to use the following architecture:-

  1. Write operation happens on SQL db and is propagated through Kafka to HDFS Ingest pipeline.
  2. HDFS will either be serving the requests directly(in case of streaming data access pattern) or will be first moving the data to a fast serving NoSQL like HBase(Reporting DB) and then serving the requests.

Now, as you can see, there can be multiple approaches to solve a problem but all of these approaches require an understanding of how distributed systems work.

Simply stated, Distributed systems refer to different computers connected via network and talking to each other. But in the presence of such an environment, how would you answer the following questions:

— How to maintain data state?
— How to perform the computations on data?

You can think of it as similar to OOP concept of class:
Field are similar to maintaining the state and methods are equivalent to computations on data.

  1. The question of maintaining the data state needs an understanding of topics like CAP theorem, NoSQL/RDBMS, Distributed caching, HTTP caching etc.
  2. And the question of Computations on data requires the understanding of:
    — Network patterns
    — Load balancers
    — Parallel computing
    — Event-driven architecture

I will be talking about CAP theorem in this blog and will cover rest of the topics in my subsequent blogs.

CAP theorem states that in case of network partitioning, you can either achieve consistency or availability i.e. only 2 types of system are possible: CP or AP.

Hey, but what is that CP and AP? Here’s what each letter stands for:-

C = Consistency = data should be consistent on read/write i.e. everyone should have single view of data.
A = Availability = data should be highly available i.e. there should not be any downtime associated with data read/write.
P = Partition Tolerance = refers to network partitions

Lets dig deeper into CAP

Network Partitioning:

Imagine this simple scenario where nodes are connected to each other using some network pattern and lets say some network issue arises between two subsets of nodes. As shown below, now we have two different systems where a read/write request can reach i.e. Network-1 and Network-2.

With respect to above image, lets consider both the systems.

  1. CP: Consistency with Network Partitioning: Imagine a scenario where a write request comes in — it could either go to Network-1 and Network-2. But since the network issue is present, you can’t achieve consistency between both the networks. Hence to achieve consistency, you would have to make one of the network unavailable and then you would be able to make your system highly consistent since all the requests whether read or write would be served by Network-1.
  2. AP: Availability with Network Partitioning: Lets consider a scenario where both the networks are available i.e can serve read/write request. But because of network partitioning, you can’t achieve consistency between both the systems. So although the systems are available at all times, they will achieve eventual consistency, when the network issue is resolved.

These constrains are present because network partitioning can happen in the world of distributed systems.

But what about the centralized systems?

In an RDBMS like system, we have a centralized database and hence network partitioning won’t be present. That is the reason they provide the ACID properties and thereby support consistency and availability whereas NoSQL stores are an example of distributed database system i.e. data is partitioned across multiple machines and hence they could either run in CP mode or AP mode.

Here is a Venn diagram to illustrate the CAP theorem:

CAP Theorem

So, this was a high-level overview of CAP terms and I hope you are left with questions like-

  1. How can I achieve availability?
  2. What could be different levels of consistency?

These are huge topics in themselves and require a lot of discussion. So until I come up with the answers in my next blog, here are some pointers for these questions:

1. Availability can either be achieved using fail-over mechanisms or using replication where replication mode could be either: Active(Push) or Passive(Pull).

2. There are different types of consistency though the prominent ones are Strict consistency, Sequential consistency, Linearizable consistency and Eventual consistency.

--

--