Databricks Workflows and the Era of Serverless Job Compute

Ryan Chynoweth
4 min readApr 19, 2024

--

Databricks Workflows is an orchestration product that enables engineers to create data pipelines and is available to all users at no additional cost. This means organizations using external solutions for scheduling Databricks workloads can eliminate orchestration costs by defining their jobs within Workflows, thus reducing the total cost of ownership.

Excitingly, Databricks has introduced serverless job compute! In addition to Serverless SQL, users now have two options for serverless compute within Workflows. Serverless jobs offer reduced configuration options, providing Databricks users with a simplified way to schedule jobs without feeling overwhelmed by settings and allowing Databricks to manage the infrastructure and scaling.

Databricks clusters are uniquely positioned with regards to their economic scaling capability. Other data solutions have a cost that scales exponentially where each time you want to increase computation resources the price doubles. Clusters allow organizations to add a single node at a time, achieving linear scalability and this scaling model holds true in the serverless option as well.

Serverless compute allows customers to avoid provisioning cloud infrastructure in their accounts and consolidates billing under Databricks. There are several key benefits as it relates to infrastructure when using serverless compute with Databricks.

  • IP Addresses from workspace subnets are not consumed — each node in a cluster is assigned two IP addresses. For example a cluster with 5 workers and 1 driver needs 12 IPs. Many organizations choose to deploy clusters into their own VPC/VNET but need to conserve precious IP addresses. With serverless, customers can deploy smaller networks and reserve IPs for workloads that require private connectivity to external resources (e.g. ingestion) and use serverless for other workloads as permitted.
  • vCPU quotas are not drawn down — another issue that can arise is when customers reach their core limit on a particular VM type resulting in cluster creation failure unless the quota is increased. Serverless compute will not take from existing quotas giving engineers the ability to continue deploying workloads without making a request to the cloud infrastructure team.
  • Immediate cluster availability— it is no secret that traditional Databricks clusters can take several minutes to be ready, serverless compute is available almost instantaneously allowing customers to reduce overhead and eliminate any idle time.

There are limitations for serverless compute such as Databricks Runtime for Machine Learning is not supported and Python/SQL are the only available languages. Please review the full list of limitations during the evaluation of this new compute option.

Aside from the limitations, I would like to highlight the types of workloads I recommend moving to serverless jobs.

  • Short running jobs — there is nothing more frustrating than waiting 5 minutes for a cluster to become available to then run a 1 minute job. While customers are only charged for the runtime of the cluster, it is still not ideal. Serverless solves this with instant compute, so I would recommend evaluating short running jobs on serverless to see if it improves the total runtime of jobs (job runtime + cluster creation time).
  • Frequent running jobs — in situations where customers need to run a job often (e.g. every 5 minutes) it is difficult to wait the provisioning time because it can vary and the next job execution is following close behind. Customers may opt to run these jobs on all purpose clusters which will need to be restarted periodically. Serverless job clusters allow engineers to achieve job SLAs and avoid having to manage an all purpose cluster all together.
  • Jobs using instance pools — instance pools allow organizations to provision machines in their cloud account with pre-installed dependencies. This allows for faster cluster creation time (~45 seconds) and Databricks does not charge for these resources when they are not being used, however, customers would still incur VM charges from the cloud provider for the idle time. If there is a solution using instance pools, I would highly suggest evaluating serverless job compute as it should drastically decrease total cost of ownership (if you are an engineer reading this, make sure to take credit and present to leadership about the cost savings you it took).
  • IP Address Constraints — as mentioned previously, if a workspace is constrained on IP addresses then serverless can help alleviate the issue by running jobs that don’t require custom IP addresses. Simply reserving IP addresses for tasks that require them and opting for serverless elsewhere allows organizations to keep workspace network ranges to a minimum.

Conclusion

Serverless job compute simplifies the job creation process and solves common complaints from the traditional provisioning of Databricks resources. For additional information on serverless security please see here.

Disclaimer: these are my own thoughts and opinions and not a reflection of my employer

--

--

Ryan Chynoweth

Senior Solutions Architect Databricks — anything shared is my own thoughts and opinions