Amazon system design and architecture

Kethan Pothula
4 min readAug 1, 2020

--

This Article is written based on “CODEKARLE” System Design Series.

https://www.codekarle.com/

Introduction

In this, we are going to discuss the solution for system design interview questions where you need to design the ECommerce application like amazon.

Requirements

Functional requirements

1.search

The customers should be able to search the required goods they want to buy along with that we need to provide the information that we can deliver them or not.

If the person lives in the remote area where we can’t deliver we need to mention it on the first page itself otherwise it leads to bad user experience.

And if we can deliver we need to mention the maximum time required for delivery.

2.cart/wishlist

A cart is a place where the user can add their favorite items or the items they want to buy.

3.checkout

Checkout is the place where the customer will confirm the order and make payment for the goods.

4.view orders.

In this sector, people can view their present and past orders .they can even track the present order and the expected delivery date.

Non-functional requirements

1. Low latency

The system should have very low latency because if the user experiences the lag while using the application it leads to the bad user experience.so most of the user interface needs to be low latency.

2. High availability

The algorithm such as search algorithms and the orders need to be highly available.

3. High consistency

The programs related to payment and the user information must be highly consistent.

system design and architecture diagram of amazon

photo by sandeepkaul on github

Database

We used a lot of clusters for maintaining the database in our model some of them are.

Elastic search CLUSTER

An Elasticsearch cluster is a group of nodes that have the same attribute. As nodes join or leave a cluster, the cluster automatically reorganizes itself to evenly distribute the data across the available nodes.

CASSANDRA CLUSTER

Cassandra is a peer-to-peer distributed system made up of a cluster of nodes in which any node can accept a read or write request.

REDIS CLUSTER

Redis is an open-source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperlogs, geospatial indexes with radius queries and streams.

ITEM DB MONGO CLUSTER

There are two different distributed configurations of MongoDB. The first is a “replica set”, where several servers carry the same data, to protect against failure. The second is a “sharded cluster”, where several servers each carry only a fragment of the whole data, to achieve higher performance and carry larger data sets.

KAFKA CLUSTER

KAFKA is used for stream processing, website activity tracking, metrics collection and monitoring, log aggregation, real-time analytics, CEP, ingesting data into Spark, ingesting data into Hadoop, CQRS, replay messages, error recovery, and guaranteed distributed commit log for in-memory computing (microservices).

Kafka consumer

In Kafka, each topic is divided into a set of logs known as partitions. Producers write to the tail of these logs and consumers read the logs at their own pace. Kafka scales topic consumption by distributing partitions among a consumer group, which is a set of consumers sharing a common group identifier

Spark jobs

That said, Spark has his definition for “job”, directly from the glossary:

Job A parallel computation consisting of multiple tasks that get spawned in response to a Spark action (e.g. save, collect); you’ll see this term used in the driver’s logs.

So in this context, let’s say you need to do the following:

  1. Load a file with people names and addresses into RDD1
  2. Load a file with people names and phones into RDD2
  3. Join RDD1 and RDD2 by name, to get RDD3
  4. Map on RDD3 to get a nice HTML presentation card for each person as RDD4
  5. Save RDD4 to file.
  6. Map RDD1 to extract zip codes from the addresses to get RDD5
  7. Aggregate on RDD5 to get a count of how many people live on each zip code as RDD6
  8. Collect RDD6 and print these stats to the stdout.

Rest service

Restful Web Service is a lightweight, maintainable, and scalable service that is built on the REST architecture. Restful Web Service, expose API from your application in a secure, uniform, stateless manner to the calling client. The calling client can perform predefined operations using the Restful service.

Load balancer

In computing, load balancing refers to the process of distributing a set of tasks over a set of resources, with the aim of making their overall processing more efficient.

And all this written based on “CODE KARLE” system design series.

https://www.codekarle.com/

--

--