In my last post, I showed you how to provision a low-cost Apache Spark cluster on Microsoft Azure, with the help of the Azure Batch service, Low Priority Virtual Machines, and the Azure Distributed Data Engineering Toolkit (AZTK).
But have you tried to mix a cluster with Dedicated-, as well as Low Priority-Virtual Machines?
If you did, you propably run into an error…
I tried to provision a Spark cluster with the following command:
aztk spark cluster create --id mycluster --size 1 --size-low-priority 2 --vm-size standard_d12_v2
But all I got, was the following error message:
But let my first start with the Why. …
A few months ago, I found a nice little open-source tool on GitHub called AZTK, which provides a fast and easy way to provision low-cost Apache Spark clusters on Microsoft Azure.
In this blog post, I would like to show you, how to install the Azure Distributed Data Engineering Toolkit (AZTK) on your Windows-, Linux- or MacOS-based system, and how to provision your first Apache Spark cluster with it.
The Azure Distributed Data Engineering Toolkit (AZTK) is a python CLI application for provisioning on-demand Spark on Docker clusters in Azure. …