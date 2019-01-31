Detecting Performance Anomalies in External Firmware Deployments Netflix Technology Blog Blocked Unblock Follow Following Jan 31 by Richard Cool Netflix has over 139M members streaming on more than half a billion devices spanning over 1,700 different types of devices from hundreds of brands. This diverse device ecosystem results in a high dimensionality feature space, often with sparse data, and can make identifying device performance issues challenging. Identifying ways to scale solutions in this space is vital as the ecosystem continues to grow both in volume and diversity. Streaming devices are also used on a wide range of networks which directly impact the delivered user experience. The video quality and app performance that can be delivered to a limited-memory mobile phone with a spotty cellular connection is quite different than what can be achieved on a cable set top box with high speed broadband; understanding how device characteristics and network behavior interact adds a layer of complexity in triaging potential device performance issues. We strive to ensure that when a member opens the Netflix app and presses play, they are presented with a high-quality experience every step of the way. Encountering an error page, waiting a very long time for video to begin playing, or having the video pause during playback, etc. are poor experiences, and we strive to minimize them. Previous blog posts have detailed the efforts of the Device Reliability Team (part 1, part 2) to identify issues and troubleshoot them and have given examples of the uses of machine learning to improve streaming quality. Device-related issues typically occur in one of two scenarios: (1) Netflix introduces a change to the app or backend servers that interacts badly with some devices or (2) a consumer electronics partner, browser developer, or operating system developer pushes a change (e.g. a firmware change or browser/OS change) that interacts poorly with our app. While we have tools for dealing with the first scenario (for example, automated canary analysis using Kayenta), the second type previously was only detected when the update had reached a sufficient volume of devices to shift core performance metrics. Being able to quickly identify firmware updates that result in poorer member experience allows us to minimize the impact of these issues and work with device partners to root-cause problems.

Figure 1 — Monthly number of firmware releases seen on consumer electronics devices streaming Netflix.

Figure 1 shows that the rate at which our consumer electronics device partners are pushing new firmware is growing rapidly. In 2018, our partners pushed over 500 firmware pushes a month; this value will likely pass 1,000 firmware upgrades per month by 2020. Often firmware rollouts begin slowly with a fraction of all devices receiving the new firmware for several days before the rest of the devices are upgraded. These rollouts are not random; often a specific subset of devices are targeted for new firmwares and sometimes rollouts target specific geographic regions. Naive analysis of metric changes between new firmwares and devices on older firmwares can be confounded by the non-random rollout, so it’s important to control for this when asking if a new firmware has negatively impacted the Netflix member experience. Putting the Pieces Together Consider the case of a metric which follows the grey distribution (with a mean value of ~ 4,570) shown in Figure 2. We see a new firmware deploy in the field (red distribution) which follows an approximately normal distribution with noticeably higher mean of 5,600, indicating that devices using the new firmware have a poor experience than the mean of the full device population. Should we be concerned that the new firmware has resulted in lower performance than prior versions?

Figure 2 — Left: Hypothetical distribution of a device performance metric between the control sample of devices (grey) and a population of the same devices which have been upgraded to a new firmware (red). Right: The control sample (red) has been broken into multiple sub-components (grey) based on geographic region.