Journey to Scalable and Efficient Sportsbook Operations

Published in

DraftKings Engineering

6 min readDec 18, 2023

Introduction

Providing the best player experience in a Sportsbook platform requires flexible and agile infrastructure. The number of concurrent players during big events can quickly multiply many times in a matter of minutes.
This article explores DraftKings’ strategic initiative to fine-tune its infrastructure, ensuring it operates at peak performance while being cost-effective at the same time.

The composition of our services infrastructure contains mainly, but not only, API-based microservices, which are deployed in containers and orchestrated by Kubernetes, wrapped by service discovery and load-balancing infrastructure that ensures and validates that all services are available, accessible, and healthy.

Goals:

Utilize efficiently cloud-allocated resources and continuously adapt resource allocation to accommodate our expanding business.
Ensure that the engineers have detailed analytical data to support their decision-making process.

Before You Autoscale: Mastering Visibility for Effective Horizontal Pod Scaling

We needed visibility on a few things before doing any horizontal or vertical scaling, up or down.

High-level summary of operational assessed cost and average resource utilization per system/department. Stakeholders can base their additional investigations and improvements on those initial reports.
Managers should know how their team/unit/department handles resources from a high-level perspective.
Engineers should be able to view the resource usage of every component they are responsible for.
Engineers should be able to check the resource usage history to identify usage patterns during high and low loads of the entire system.

Proper tools so that the engineers can adjust and optimize the resource usage of their services without affecting performance and resilience.

Monitoring

Having data and infrastructure visibility is the first and most essential part.
Knowing the utilization of a Kubernetes Cluster is a good start, but not much can be done with only this information. Engineering needs a lot more granular level data and specific information on a service level to make any decisions.
There are a few things that we did to achieve this:

Active Kubernetes Clusters Monitoring with Historical Time-series data and graphs

A few dashboards and charts are used for analytical proposes and identifying points of interest.

Active Monitoring on a Workload/Pod level with Historical Time-series data and graphs

A few dashboards and charts are meant for engineers to dive deep and investigate individual workloads/pods inside the cluster.

Reporting

Scheduled weekly and monthly reports for the domain owners and senior management so relevant stakeholders and engineering teams can track and address mismatches in costs and utilization. The reports include a high-level summary of operational assessed cost and average resource utilization per unit/department.

What do the numbers mean?

Overall

The report is a very high-level presentation of a few essential metrics.
The report displays aggregations per Unit grouped by namespace.

Columns

CPU Requested: Sum of the requested CPU for all deployments/pods in the namespace.
CPU Used (mean): Sum of the CPU used by all of the deployments/pods of the same deployment.
CPU Used (max): Represents the namespace’s highest CPU usage (sum of all pods).
CPU Unused (mean): (CPU Requested) — (CPU Used (mean))
CPU Unused Cost per Month (XX$/CPU): ((CPU Requested) — (CPU Used (mean)) * XX). The number is an approximation based on XX price per CPU per month.
The number is not 100% accurate; however, it gives a great cost representation and approximation.
The price per CPU should be an approximation based on the Cloud subscription and its cost.
The price per CPU can always be adjusted and changed.

Tools

Kubernetes Horizontal Pod Autoscaler

Kubernetes provides an autoscaling mechanism, Horizontal Pod Autoscaler (HPA), that can change the number of pods in a deployment based on a metric like CPU utilization. This works if the traffic increases gradually, but it can lead to issues during surges, as the HPA needs time to adjust the number of instances to the correct value.

Kubernetes Scheduled Autoscaler with Scaling Profiles

We introduced a simple scaling mechanism (Schedule Autoscaler) based on a schedule that allows service operators to “warm up” their deployments before expected traffic spikes. The tool consists of a simple Kubernetes controller and a set of CRDs. The implementation can control the number of replicas in a deployment directly.

The Scaling Profile is the core of the Schedule Autoscaler. It’s the binding between a schedule and a service.

The example below shows a “Simple” scaling profile, controlling the deployment called “some-workload.” The profile has a single schedule defined, applicable to Tuesday and Thursday weekdays, and the exact date is 2021–09–09. The profile has two inline time frames — “13:00–15:00” and “18:00–20:00”.

The ScalingProfile is a namespaced resource. Hence, it must be applied in the same namespace as the controlled deployment.

The maximum number of replicas is taken in case of an overlap between 2 or more schedules or time frames.

In this case, the number of instances between 14:30 and 14:45 on 2021–10–05 will be 10 and 8 between 14:45 and 20:00.

Summary

Optimizing the sportsbook platform infrastructure aims to provide the best possible experience while ensuring cost-effectiveness.

Scheduled weekly and monthly reports further facilitate tracking, address mismatches in costs and resource utilization, and provide stakeholders with essential information.
Monitoring both Kubernetes cluster and workload/pod levels, offering historical time-series data and graphs for comprehensive insights.

We utilize Kubernetes features such as Horizontal Pod Autoscaler (HPA) to adjust pod numbers automatically based on observed metrics like CPU utilization. Additionally, we introduced a Scheduled Autoscaler with Scaling Profiles, allowing service operators to scale deployments before anticipated traffic spikes proactively.

Glossary

Sportsbook Platform: A digital platform that allows users to place bets on various sports events.
API-based Microservices: Small, independent services that communicate over a network to fulfill one or more business capabilities, accessible through application programming interfaces (APIs).
Containers: Lightweight, standalone, executable software packages that include everything needed to run an application: code, runtime, system tools, system libraries, and settings.
Service Discovery: The automatic detection of devices and services offered by these devices on a computer network.
Load-Balancing: Distribution network traffic across multiple servers ensures no single server bears too much demand.
Horizontal Pod Autoscaler (HPA): A Kubernetes feature that automatically adjusts the number of pods in a deployment based on observed CPU utilization or other selected metrics.
Scheduled Autoscaler with Scaling Profiles: A custom Kubernetes mechanism allowing for scheduled scaling of services based on predefined profiles.
Scaling Profile: A Kubernetes resource defining the relationship between a schedule and a service, dictating how service scaling should be performed at specific times.
Cloud-allocated Resources: Computing resources such as storage, networking, and processing power allocated from a cloud provider.
Operational Assessed Cost: The evaluated cost of running and maintaining IT operations, typically in a cloud computing environment.
Kubernetes Controller: A non-terminating loop that regulates the state of a system in Kubernetes.

Want to learn more about DraftKings’ global Engineering team and culture? Check out our Engineer Spotlights and current openings!