Fenzo: OSS Scheduler for Apache Mesos Frameworks
Bringing Netflix to our millions of subscribers is no easy task. The product comprises dozens of services in our distributed environment, each of which is operating a critical component to the experience while constantly evolving with new functionality. Optimizing the launch of these services is essential for both the stability of the customer experience as well as overall performance and costs. To that end, we are happy to introduce Fenzo, an open source scheduler for Apache Mesos frameworks. Fenzo tightly manages the scheduling and resource assignments of these deployments.
Two main motivations for developing a new framework, as opposed to leveraging one of the many frameworks in the community, were to achieve scheduling optimizations and to be able to autoscale the cluster based on usage, both of which will be discussed in greater detail below. Fenzo enables frameworks to better manage ephemerality aspects that are unique to the cloud. Our use cases include a reactive stream processing system for real time operational insights and managing deployments of container based applications.
At Netflix, we see a large variation in the amount of data that our jobs process over the course of a day. Provisioning the cluster for peak usage, as is typical in data center environments, is wasteful. Also, systems may occasionally be inundated with interactive jobs from users responding to certain anomalous operational events. We need to take advantage of the cloud’s elasticity and scale the cluster up and down based on dynamic loads.
Although scaling up a cluster may seem relatively easy by watching, for example, the amount of available resources falling below a threshold, scaling down presents additional challenges. If the tasks are long lived and cannot be terminated without negative consequences, such as time consuming reconfiguration of stateful stream processing topologies, the scheduler will have to assign them such that all tasks on a host terminate at about the same time so the host can be terminated for scale down.
Scheduling tasks requires optimization of resource assignments to maximize the intended goals. When there are multiple resource assignments possible, picking one versus another can lead to significantly different outcomes in terms of scalability, performance, etc. As such, efficient assignment selection is a crucial aspect of a scheduler library. For example, picking assignments by evaluating every pending task with every available resource is computationally prohibitive.
Our design focused on large scale deployments with a heterogeneous mix of tasks and resources that have multiple constraints and optimizations needs. If evaluating the most optimal assignments takes a long time, it could create two problems:
- resources become idle, waiting for new assignments
- task launches experience increased latency
Fenzo adopts an approach that moves us quickly in the right direction as opposed to coming up with the most optimal set of scheduling assignments every time.
Conceptually, we think of tasks as having an urgency factor that determines how soon it needs an assignment, and a fitness factor that determines how well it fits on a given host.
If the task is very urgent or if it fits very well on a given resource, we go ahead and assign that resource to the task. Otherwise, we keep the task pending until either urgency increases or we find another host with a larger fitness value.
Trading Off Scheduling Speed with Optimizations
Fenzo has knobs for you to choose speed and optimal assignments dynamically. Fenzo employs a strategy of evaluating optimal assignments across multiple hosts, but, only until a fitness value deemed “good enough” is obtained. While a user defined threshold for fitness being good enough controls the speed, a fitness evaluation plugin represents the optimality of assignments and the high level scheduling objectives for the cluster. A fitness calculator can be composed from multiple other fitness calculators, representing a multi-faceted objective.
Fenzo tasks can use optional soft or hard constraints to influence assignments to achieve locality with other tasks and/or affinity to resources. Soft constraints are satisfied on a best efforts basis and combine with the fitness calculator for scoring hosts for possible assignment. Hard constraints must be satisfied and act as a resource selection filter.
Fenzo provides all relevant cluster state information to the fitness calculators and constraints plugins so you can optimize assignments based on various aspects of jobs, resources, and time.
Bin Packing and Constraints Plugins
Fenzo currently has built-in fitness calculators for bin packing based on CPU, memory, or network bandwidth resources, or a combination of them.
Some of the built-in constraints address common use cases of locality with respect to resource types, assigning distinct hosts to a set of tasks, balancing tasks across a given host attribute, such as the availability zone, rack location, etc.
You can customize fitness calculators and constraints by providing new plugins.
Fenzo supports cluster autoscaling using two complementary strategies:
- Thresholds based
- Resource shortfall analysis based
Thresholds based autoscaling lets users specify rules per host group (e.g., EC2 Auto Scaling Group, ASG) being used in the cluster. For example, there may be one ASG created for compute intensive workloads using one EC2 instance type, and another for network intensive workloads. Each rule helps maintain a configured number of idle hosts available for launching new jobs quickly.
The resource shortfall analysis attempts to estimate the number of hosts required to satisfy the pending workload. This complements the rules based scale up during demand surges. Fenzo’s autoscaling also complements predictive autoscaling systems, such as Netflix’s Scryer.
Usage at Netflix
Fenzo is currently being used in two Mesos frameworks at Netflix for a variety of use cases including long running services and batch jobs. We have observed that the scheduler is fast at allocating resources with multiple constraints and custom fitness calculators. Also, Fenzo has allowed us to scale the cluster based on current demand instead of provisioning it for peak demand.
The table below shows the average and maximum times we have observed for each scheduling run in one of our clusters. Each scheduling run may attempt to assign resources to more than one task. The run time can vary depending on the number of tasks that need assignments, the number and types of constraints used by the tasks, and the number of hosts to choose resources from.
Scheduler run time in milliseconds
Average: 2 mS
Maximum: 38 mS (occasional spikes of about 30 mS)
The image below shows the number of Mesos slaves in the cluster going up and down as a result of Fenzo’s autoscaler actions over several days, representing about 3X difference in the maximum and minimum counts.
Fenzo Usage in Mesos Frameworks
A simplified diagram above shows how Fenzo is used by an Apache Mesos framework. Fenzo’s task scheduler provides the scheduling core without interaction with Mesos itself. The framework, interfaces with Mesos to get callbacks on new resource offers and task status updates. As well, it calls Mesos driver to launch tasks based on Fenzo’s assignments.
Fenzo has been a great addition to our cloud platform. It gives us a high degree of control over work scheduling on Mesos, and has enabled us to strike a balance between machine efficiency and getting jobs running quickly. Out of the box Fenzo supports cluster autoscaling and bin packing. Custom schedulers can be implemented by writing your own plugins.
Source code is available on Netflix Github. The repository contains a sample framework that shows how to use Fenzo. Also, the JUnit tests show examples of various features including writing custom fitness calculators and constraints. Fenzo wiki contains detailed documentation on getting you started.
Originally published at techblog.netflix.com on August 20, 2015.