In my last post, I showed you how to provision a low-cost Apache Spark cluster on Microsoft Azure, with the help of the Azure Batch service, Low Priority Virtual Machines, and the Azure Distributed Data Engineering Toolkit (AZTK).

But have you tried to mix a cluster with Dedicated-, as well as Low Priority-Virtual Machines?

If you did, you propably run into an error…

Mixing Dedicated- and Low Priority, Virtual Machines

I tried to provision a Spark cluster with the following command:

aztk spark cluster create --id mycluster --size 1 --size-low-priority 2 --vm-size standard_d12_v2

But all I got, was the following error message:

Image for post
Image for post

What do I need a mixed mode cluster for

But let my first start with the Why. …


A few months ago, I found a nice little open-source tool on GitHub called AZTK, which provides a fast and easy way to provision low-cost Apache Spark clusters on Microsoft Azure.

In this blog post, I would like to show you, how to install the Azure Distributed Data Engineering Toolkit (AZTK) on your Windows-, Linux- or MacOS-based system, and how to provision your first Apache Spark cluster with it.

Azure Distributed Data Engineering Toolkit (AZTK)

The Azure Distributed Data Engineering Toolkit (AZTK) is a python CLI application for provisioning on-demand Spark on Docker clusters in Azure. …

About

Sascha Dittmann

Data Solution Architect on Microsoft Azure, focusing on SQL Server, Big Data, IoT and Machine Learning. Public Speaker. Author.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store