Data distribution
In a previous article, Azure Data Explorer at the Azure business scale, I talked about how we use Azure Data Explorer as our main data store and how it supports our workloads. I ended the article by talking about some of the scale considerations. This article covers our data distribution model and how we handle scale.
Azure Data Explorer features
First, let’s do a quick overview of the Azure Data Explorer infrastructure and some of the key relevant features. Azure Data Explorer runs on a set of Azure Virtual Machines (VMs) and uses Azure Storage for “cold” data. As part of the cluster configuration, we can choose different VM SKUs and the number of nodes based on expected workloads. Some VM SKUs are optimized for storage, others for compute. There is also an Autoscale feature that can automatically increase and decrease the number of nodes, based on load, on which the cluster runs.
An Azure Data Explorer cluster contains one or more databases. Databases contain tables and functions. If you are unfamiliar with Azure Data Explorer, a function is the equivalent of a SQL view. Permissions are set at the database level, granting users, groups, and services access to view and/or modify the database.
Another important concept is that of leader/follower. An Azure Data Explorer cluster can follow a database from another…