Netflix — A timeless story of perseverance, innovation, and transformation

Ajmal Majeed
7 min readJan 25, 2024

--

I’m quite positive you’ve got a Netflix subscription sitting right inside your television beside your living room. Yep, we all do, at least most of us that have some time to spare for some television binging. As of recently, i’ve had trouble finding something worth watching apart from the epic documentaries Netflix has been publishing. I guess that’s something the filmmaking / storytelling industry is going to have to work on, but that’s a story for another time.

Before we dive into how Netflix emerged through their many challenges to now be valued at 212.5 billion USD, here’s a fun fact — During the year 2000, Netflix faced financial challenges that led them to offer Blockbuster Inc. to purchase their organization for $50 million, which was turned down. Here we are today, Netflix and chilling, with no blockbuster to be heard of.

Alright, without further ado, lets dive into their story, and how they leveraged, at the time, primitive technology, which reshaped their entire business.

Netflix started off as a DVD-renting company that was providing services on a rental basis, and eventually moved on to providing their customers monthly subscriptions for their movie rentals.

Down the line, Netflix was providing online DVD rental services to thousands of customers. During the time, Netflix was running its entire operations on-premises and hadn’t initiated moving to the Cloud. The organization came across a major outage caused by their database that was running on-premises, which had a major impact on their strategy moving forward. This bottleneck led them to decide exploring the Cloud to eliminate such outages in the future, which transformed the entire trajectory of business thereafter.

During its Cloud adoption journey, Netflix faced several challenges that required unconventional thinking. In this article, we will be discussing some of the major challenges they faced, and how they overcame them.

Key challenges to Netflix’s Cloud Adoption Journey.

  • Legacy architecture

Netflix was using a monolithic architecture for their application and services. This was a major obstacle during their cloud adoption journey due to the limitations a monolithic architecture posed. To utilize the vast benefits of Cloud computing, Netflix had to re-architect its entire fleet of applications and services. This required moving to a micro-services architecture. Re-architecting required tremendous knowledge and expertise in executing this migration successfully.

  • Serving customers around the globe

Secondly, with the increasing number of users onboarding with Netflix’s entertainment services from all over the globe, they had to ensure users had a seamless and friendly experience with un-disrupted service. This too was a major challenge in their cloud adoption journey. The company had to explore cloud-native solutions to address this issue, and they looked at using a CDN to overcome this issue, which we will take a look at.

  • Extremely high loads of traffic

Finally, they also had to deal with the compounding traffic received into their servers, which required the company to come up with strategies to make sure the traffic coming into their servers was handled well, with high throughput and low latency to make sure their customers did not face issues while streaming, causing discouragement, and having a major impact on Netflix’s business.

How does Netflix resolve these challenges?

  • Firstly, let’s look at how Netflix overcame its challenge of having a monolithic architecture for its applications and services. During the time, the term micro-services is something we would’ve barely heard of as it was in it’s primitive stages of development, and that is exactly the solution Netflix looked at to move from a monolithic application.The reason they decided to move to a micro-service architecture is due to the vast number of benefits it would bring to the consistency, reliability, and integrity of their application. They took on the journey of re-architecting their entire application and utilizing the benefits of speed and agility that came with it. This involved breaking down every component of their application and identifying how each component could be re-architected to stand alone, and function independently without impacting other components of the application. This journey lasted from 2008 to 2016 and was a success.
  • The next challenge at hand was how they were going to serve millions of users around the globe without any disruption. Netflix decided to use a CDN (Content Delivery Network) to resolve this challenge. A CDN is designed to deliver content to users around the globe by picking the closest server location to the user. This is done by caching content on edge locations so that users would access the closest server to their location, achieving the lowest possible latency in streaming. However, Netflix didn’t stop there. They built their own CDN — “OpenConnect”, tailored to their needs. They captured an opportunity to partner up with local ISPs (Internet Service Providers) and provide ISPs with the opportunity to partner up with Netflix in installing OpenConnect directly with the ISP and minimizing the delivery of traffic served over transit providers. This was a game changer for Netflix as they were able to cut down buffering speeds by an order of magnitudes.
  • Netflix was dealing with enormous levels of traffic entering their systems, users were pressing the play button on movies and TV shows thousands of times a second, and this required the organization to make sure their systems can handle this nature of load without any failure, and even if it did face one, it would know how to auto recover. Netflix is known to be one of the most popular in the space of Information Technology due to the mechanism they came up with to deal with this specific issue — “Chaos Engineering”. Netflix initiated a self-driven chaos automation platform that was designed to test problems they may encounter with their infrastructure. They came up with a strategy that made sure they could survive a failure in one of the three regions Netflix was using for its infrastructure. Every month, the team would turn off one of the regions it was using and test if their systems would automatically switch from the disrupted region to another and make sure their users are still being served within six minutes. This was one of the early-stage mechanisms used by Netflix for chaos engineering, which has brought them to where they are today. Eventually, they came up with their own “Chaos Monkey” tool, as they call it, a vital component of their Simian Army Suite. This tool plays a crucial role in testing and ensuring the fault tolerance of Netflix’s production environment. The tool would randomly terminate instances within the system to simulate failures that could occur in real-world scenarios.

You might be wondering, what is the “Simian Army Suite” exactly? Well, here’s your answer straight from Netflix, quoting from an article they explained it in detail — “The name comes from the idea of unleashing a wild monkey with a weapon in your data center (or cloud region) to randomly shoot down instances and chew through cables — all the while we continue serving our customers without interruption.”

Here are some of the key lessons I believe we can learn from Netflix’s journey -

The Netflix story is truly an inspiring one for all engineers and is still one that is being studied and understood to solve current challenges. Why is that the case? The Netflix Cloud adoption story is aging by the day, yet it is still referred to for learning’s to address current challenges. I believe the reason for this is because the Netflix Cloud Adoption story is timeless, due to the fundamentals of problem-solving they used to overcome challenges. Here are some of the key learnings grasped during my study of the Netflix Cloud Adoption journey.

  • Progressive mindsetThis story addresses one of the fundamentals of the tech industry, and that is to cultivate a progressive mindset in the culture of our company. Netflix did not take the relational database crash lightly because they understood it was an issue that could recur, and if it is, it was something that had to be addressed. The team didn’t look at Cloud adoption as a tedious challenge to overcome its problem, instead, they saw it as an opportunity to exponentially grow their business.
  • Innovative measures, along with precise planningSecondly, as we observe, we can see the journey of migrating to the Cloud came with several challenges to the team, such as re-architecting the entire application from a monolithic to a micro-service architecture. However, the team was forward-thinking enough to overcome this by planning their entire migration and re-architecting journey one step at a time. Netflix moved one application at a time from their on-premises data centers to the Cloud and addressed one problem at a time along with a step-by-step plan. This explains the importance of breaking down a problem into smaller chunks and coming up with a precise plan for execution.
  • Creating opportunities along the wayNetflix took the issue to serve users around the globe as a challenge to solve, and an opportunity to seize. They understood the business opportunity that was in place by partnering up with ISPs and installing their custom-made CDN, which benefited both Netflix, and ISPs as they wouldn’t have to go through a peering point and maybe transit four of five other networks until it gets through to the place that holds the content. This could result in slower content delivery as well as cause additional expenses to ISPs. Installing the Open Connect solution into ISPs provided Netflix users with ultra-low latency streaming and minimized the amount of traffic served over a transit provider.

Today, Netflix serves over 247.15 million paying subscribers all over the globe.

Follow me for more on #AllThingsCloud

LinkedIn

Happy reading! Stay curious, and have a great day!

--

--