Author: Andrew McVeigh, Principal Architect
As the chief architect at Hulu, one of my key responsibilities is to understand our systems architecture and how the pieces fit together. We are breathing new life into our tech blog so expect more deep technical dives from us in the coming months. However before we do that, I figured it would be helpful to map our technical landscape for you. We want the blog to be a place where we can share our work and learnings with fellow members of the global tech community, and this overview will set the context for the upcoming articles.
A little about me, for context: I have worked across many domains, including investment banking, video games and digital broadcasting to name a few. I have a PhD in computer science focusing on extensible systems. In my last role before Hulu I worked on League of Legends — arguably the world’s largest game. I created the underpinnings of much of the service landscape and most recently re-architected the game client. I care deeply about how systems are put together and the way those systems are evolved to support new functionality.
I’ve been at Hulu for just under a year now, and my deep industry experience coupled with my newness to this environment gives me a rather unique perspective. I see amazing things at Hulu that others might take for granted due to familiarity. In this post, I’ll give you a 1000 ft technical view from the perspective of a relative newcomer..
Let’s start with who we are: Hulu is a video streaming service with an extensive catalog of TV episodes, movies and original programs In addition, we recently added a live TV offering with 50+ channels and a cloud DVR, supported by an innovative and hyper personalized user interface.
On the backend, we have a rather extraordinary microservice architecture, hosted primarily on our PaaS system called Donki — which can target both on-premise and cloud. You can think of it a bit like an in-house variant of Heroku — it simplifies and streamlines our deployment and management workflows. Donki itself is written in Python and supports many languages and services. In addition, we provide a set of managed services such as databases and queue systems via our DSI (infrastructure) team.
Donki allows us to target AWS or our own data center based on drivers like agility & elasticity (AWS typically), or baseline cost minimization and workload (data center). Using the same deployment artifacts we can move services fairly quickly — during our live launch we came close to running out of capacity on one of our key service groups and were able to spin up AWS quickly as a backup. We run our live video ingestion architecture on the cloud, provisioned by Donki.
We run an extensive battery of smoke tests against our system at all times, using another in-house system called Homez. These tests can signal alerts which page out to on-call teams. We show the real-time status of these tests against a dashboard which maps the results onto diagrams of the system architecture.
Hulu has always had an extensive set of algorithmic models at the very heart of its business model. We use advanced machine learning technologies in many areas, such as video understanding, personalized content recommendations, ad optimizations and targeting, spam detection and data sciences. We have built scalable data pipelines to process the massive volume of first and third party data (over 13 petabytes) that powers our analytics. We are constantly experimenting with new ML methods like real-time influence and deep learning.
On the functional side, we have the existing VOD architecture, which has recently been augmented by a live streaming architecture. To do this we recreated a substantial part of our system to add the concepts of live assets & program availability, in addition to rewriting all of our clients. This was a significant company-wide effort which resulted in a major new product for us. A key part of this effort was establishing a new metadata catalog with all live and VOD shows, populated by various sources and augmented by the out-of-band (SCTE224) and in-band (SCTE35) markers for program start and end. This is a significant and complex effort as data quality is paramount. Further, our services run at great scale due to the large number of subscribers.
Hulu has two options in its VOD offering — a lower-priced, ad supported option, and a higher priced option with no ads. Our ad server and supporting CMS allow us to predict and control our own ad inventory, in line with our media partner contracts. These systems are impressive in their own right, and give us some unique future capabilities in the live streaming space.
Our services are written in a variety of languages (Python, Java, Scala, Go, C++ etc), all communicating via REST and described via Swagger. We have a fledgling architectural process called HOOP, which aims to increase visibility of technical decisions across the company, whilst giving engineers a self-service way to achieve consensus. Our philosophy is to empower engineers with a toolbox of techniques in order to help them grow. This is vital to us, particularly given our extremely rapid growth — we have doubled our engineering staff in the last two years.
This work is done across four primary offices: Santa Monica, Beijing,Marin and Seattle. Santa Monica is our HQ and the team tends to work mainly on services, Beijing does extensive research, data science work and owns the recommendations machinery as well as other areas, and Seattle is the primary office where client work is done. In addition we have a center of excellence in Marin County, staffed by an outstanding set of game engineering veterans, which creates the base infrastructure that most of the clients run on. We support 20+ client variants from iOS to Roku through to different TV variants. We have VR players (see the Oculus store), and support devices as humble as the venerable Nintendo DS.
Even this small description of Hulu has left me feeling a bit breathless. There is so much more that I have barely begun to scratch the surface — payment systems that deal with complex business rules around partner payments & billing, ad measurement systems for revenue sharing, a 24x7 network operations center (NOC) for monitoring, detecting and alerting on system issues, systems for managing our VOD pipeline and storage, metrics systems that collect, store and analyze petabytes of events per day directly from devices — the list goes on and on.
Before finishing up, I wanted to briefly share some of our challenges. It’s certainly not all beer and skittles — we want and need to learn, improve and grow on many fronts. Our microservice architecture has many challenges around data encapsulation, for instance. Many companies are heading towards the fabled land of microservices — we are fully there and grappling with advanced (and some not so advanced!) data propagation issues that we will describe in later posts. We have greatly increased our engineering team over the last few years and are working out how we can deal with this scale more efficiently on both the product and tech axes. We are a lean engineering organization, and we spend much time working out how we can empower engineers to work more effectively, and how to help them grow.
I hope this has given you a flavor of Hulu engineering, whilst also setting the context for the articles to follow! Stay tuned for articles on many technical subjects including personalization and content discovery, our infrastructure for managing microservices, and much more. In addition to this general software engineering track, we will also publish research articles, showing how we approach some of our most difficult engineering problems.