Netflix system design and architecture

Kethan Pothula
4 min readAug 7, 2020

--

This article written based on “CODEKARLE” system design interview series

Introduction

In this we are going to discuss the system design interview questions where we need to design the video streaming platforms like netflix,amazon prime, youtube etc.

Requirements

Functional requirements

1. Upload videos

Here the production house should be able to upload the videos.

2. User home page.

Here the user should be able to see the available content.

3. Search

Here customers are able to search the required content they want to watch.

4. Support all devices.

The customers should be able to watch the content in all kinds of devices like phone,laptop,tv etc.. with different extensions.

Non-functional requirements

No buffering

No buffering is the basic requirement for the video player application if not the customers may face bad user experience.

The main engineering challenges that we face during the uploading videos are

we need to compress videos in different formats like high, medium, and low are known as codec.

And the other thing Netflix does is playing in different resolutions like 1080p,720p,480p, etc.

Netflix takes the parameters of both formats and resolutions and gives the video accordingly.

Basically, Netflix creates chunks and stores each possible type of video in different chunks.

The Netflix algorithm automatically detects whether we are watching a movie continually or watching the movie by skipping accordingly. If we are watching the movie by skipping it will allocate the data only to the chunk if you are watching the movie continually it pre-allocate the memory in advance.

Netflix stores the data in amazon s3. It is the cloud memory that is used to store the static type of data and it is relatively cheaper while compared to others.

And when we come to the server part the Netflix servers are basically situated in the USA. When the request for the servers comes from a long-distance like India it takes a long time to communicate especially videos. So Netflix uses the cache memory and stores in someplace that cache is known as open connect. The open connect is the place where the regional movies are stores this helps to access the videos fastly.

system design and architecture diagram of netflix

Database

We used a lot of clusters for maintaining the database in our model some of them are.

Elastic search CLUSTER

An Elasticsearch cluster is a group of nodes that have the same attribute. As nodes join or leave a cluster, the cluster automatically reorganizes itself to evenly distribute the data across the available nodes.

CASSANDRA CLUSTER

Cassandra is a peer-to-peer distributed system made up of a cluster of nodes in which any node can accept a read or write request.

REDIS CLUSTER

Redis is an open-source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperlogs, geospatial indexes with radius queries and streams.

ITEM DB MONGO CLUSTER

There are two different distributed configurations of MongoDB. The first is a “replica set”, where several servers carry the same data, to protect against failure. The second is a “sharded cluster”, where several servers each carry only a fragment of the whole data, to achieve higher performance and carry larger data sets.

KAFKA CLUSTER

KAFKA is used for stream processing, website activity tracking, metrics collection and monitoring, log aggregation, real-time analytics, CEP, ingesting data into Spark, ingesting data into Hadoop, CQRS, replay messages, error recovery, and guaranteed distributed commit log for in-memory computing (microservices).

Kafka consumer

In Kafka, each topic is divided into a set of logs known as partitions. Producers write to the tail of these logs and consumers read the logs at their own pace. Kafka scales topic consumption by distributing partitions among a consumer group, which is a set of consumers sharing a common group identifier

Spark jobs

That said, Spark has his definition for “job”, directly from the glossary:

Job A parallel computation consisting of multiple tasks that get spawned in response to a Spark action (e.g. save, collect); you’ll see this term used in the driver’s logs.

So in this context, let’s say you need to do the following:

  1. Load a file with people names and addresses into RDD1
  2. Load a file with people names and phones into RDD2
  3. Join RDD1 and RDD2 by name, to get RDD3
  4. Map on RDD3 to get a nice HTML presentation card for each person as RDD4
  5. Save RDD4 to file.
  6. Map RDD1 to extract zip codes from the addresses to get RDD5
  7. Aggregate on RDD5 to get a count of how many people live on each zip code as RDD6
  8. Collect RDD6 and print these stats to the stdout.

Rest service

Restful Web Service is a lightweight, maintainable, and scalable service that is built on the REST architecture. Restful Web Service, expose API from your application in a secure, uniform, stateless manner to the calling client. The calling client can perform predefined operations using the Restful service.

Load balancer

In computing, load balancing refers to the process of distributing a set of tasks over a set of resources, with the aim of making their overall processing more efficient.

--

--