Understanding design of microservices architecture at Netflix

6 min readNov 29, 2022

In 2019, Netflix, a video streaming app, consumed 15% of the Internet’s bandwidth, all across the world.

With a record 213 million paid subscribers, who spend an average of 3.2 hours per day watching movies and web series, Netflix is the world’s #1 on-demand streaming service. With over 6 billion hours of collective viewership per day, 5000 movies, and 50,000+ episodes being watched every 24 hours, Netflix has set new benchmarks in scalability and availability of video streaming, never seen or heard before.

They have stunned the world with their technical capabilities of managing such a huge user base and providing them uninterrupted video streaming experience with almost zero points of failure.

A technological marvel, an unprecedented case of designing a system architecture that refuses to fail.

In this blog series, we will decode the technological capabilities of Netflix at the backend, which ensures such an awesome level of service, and find out why Netflix is able to deliver an amazing experience, day after day, week after week, year after year.

In the first part, we will understand the system design of its architecture: The backbone of Netflix’s world-class streaming services.

In the second part, we will understand the various components of this design architecture, and in the concluding part, we will decode the system, with respect to the architecture and design of the architecture.

Let’s start!

This is how Netflix Architecture works

Although Netflix was launched in 1997, as a DVD rental startup, it took 8 years to realize the power of the Cloud. In 2008, due to the heavy demand for DVDs, they had to shut down their in-house data center, and business operations for 3–4 days, and this forced them to embrace a scalable, robust data infrastructure management.

And then everything changed.

Since then, they have wholeheartedly adopted AWS or Amazon Web Services for managing their IT infrastructure, and they replaced their existing monolithic programs hosted on their own data servers with microservices architecture, hosted on public cloud.

Due to their decision to embrace microservices-based architecture powered by public cloud from AWS, they were able to ensure zero point of failure and an extremely scalable IT infrastructure to support millions of service requests, without any hiccup.

Unprecedented scalability with microservices-based architecture

In the microservices-based architecture that Netflix deployed, larger software programs are broken down into smaller programs, or components, based on modularity, and every such component has its own data encapsulation.

Due to this reason, Netflix is able to scale its services rapidly, via horizontal scaling and workload partitioning as part of the microservices-based architecture.

In case any smaller software program is not working or slowing down the system requests, then the engineers can quickly isolate that component, and ensure uninterrupted service. Tracking of every individual software component is also possible, with microservices-based architecture.

Decoding the system architecture of Netflix

As per available reports, and blogs by renowned architecture experts, Netflix’s system architecture has two main components:

AWS or Amazon Web Services for hosting the data
Open Connect: an in-house content delivery network for serving the requests

Both of these components should work concurrently, and in sync to ensure timely delivery of content and streaming services.

If we talk about software architecture, then the three most critical components of Netflix are: Client, Backend, and Content Delivery Network (CDN).

While Client can be any supported browser where Netflix can be accessed or their own mobile app; Backend comprises AWS-based services, databases, storage, which handles everything besides streaming videos.

Some critical part of Netflix’ backend comprises of:

AWS EC2, which are scalable computing instances
AWS S3, which is a scalable storage
Business logic microservices, which are custom-built, and task-oriented frameworks
AWS DynamoDB, Cassandra, which are scalable databases
WS EMR, Hadoop, Spark, Flink, and other tools for big data processing
Video processing and transcoding tools, custom-built by Netflix

And finally, the Open Connect CDN, which is a powerful network of servers, that are deployed for streaming and storing videos on a mass scale.

These servers are called Open Connect Appliances (OCAs), and they are optimized for seamless performance via fast streaming of videos, and fast retrieval of these videos, based on the service requests.

Architecture for video playback

So, what exactly happens when a user clicks or taps on the playback button?

Here is the diagram that details the exact video playback process, when a user clicks or taps on the playback button to stream a video:

A chain of events is triggered this way after the playback button is activated:

The Open Connect Appliances or OCAs will constantly share their health reports with Cache-Control services, detailing their workload status, routability, and available videos. This way, the AWS EC2 will know which OCAs will be required to be sent to the clients for Playback Apps to respond.
The play request is sent from the client to the Playback Apps that are running on AWS EC2, to fetch the URLs of the requested video.
Playback Apps services then validate the request, by checking the user’s subscription status, availability of the video, licensing of the video based on geographical locations, and more.
Now, at this moment, after the validations, Steering services will talk with Playback Apps, to find out about the eligible OCAs for that specific video request. Both Steering service and Playback Apps are running on AWS EC2 instances, and this accelerates this process. Steering services will use the user’s IP address and ISP information to find the best OCAs for this request.
After this, a minimum of 10 OCAs are sent back by the Playback apps to the Client, for streaming that video. The client chooses the best OCA based on their speed, past performance, and quality to stream that video for the user.

And all these steps happen within less than 1 second.

Decoding backend architecture of Netflix

As shared earlier, the backend comprises services, databases, storage, and everything else besides the actual streaming process, which is handled by Playback apps.

Here too, microservices-based architecture is deployed for handling backend activities that include: user management, billing, subscription management, video transcoding, personalized recommendations to the users, and more.

Here is the possible backend architecture of Netflix, based on the available reports and blogs shared by experts in this domain:

Now, this is what happens during a typical service request placed for the backend architecture:

AWS Load balancer (ELB) handles the request for playback sent by the Client to the Backend, running on AWS.
AWS ELB will then forward this request to API Gateway Service, which is running on AWS EC2 instances. A component named Zuul has been developed by Netflix for handling these backend requests by AWS ELB, and it allows advanced features such as dynamic routing, traffic monitoring, and security, the safety of data, besides ensuring zero point of failure.
From Zuul, the request is forwarded to the Application API component, which is the core business logic that powers Netflix. In our example, the request is of Playback, and for this, Play API is engaged, under API Gateway Service. For other requests such as user authentication or subscription checks, other corresponding APIs are deployed under the Application API component.
Now, Play API will either call a microservice or a sequence of microservices for fulfilling this request. In our example, Playback Apps service, Steering service and Cache-Control are being deployed as microservices for fulfilling the request by Play API.
Hystrix is a very advanced program developed by Netflix, that isolates every microservice from the other, that enables minimum failures, and enables unprecedented resilience for ensuring higher success rates of fulfilling the requests.
Microservices can also be used to track the users’ activities and their history and other data to Stream Processing Pipeline for real-time recommendations and suggestions to the user, for enhancing engagement and user experience.
The processed data coming out from Stream Processing Pipeline is then fed to big data processing tools such as AWS S3, Hadoop HDFS, Cassandra for the next action.

This is how Netflix system architecture works, serves the requests made by clients and delivers a powerful performance.

In the next part, we will decode the components of Netflix architecture and more.

Connect with us to find out more about our Netflix app development company, and how we can assist you in launching such similar apps, for dominating your specific niche.

We have some of the most talented and passionate streaming app developers, who can understand your needs, and accordingly suggest the best way forward.