Synapse Serverless Spark — Bicep Approach!
A new Abstraction layer was introduced with Bicep, replacing the famous ARM templates, as deeply explained in a previous article, “Bicep — Sustainable Coding!”. No high learning curve is required anymore. We can start coding straight away using Bicep configuration files.
This is what we will be doing in this blog post. Even better, we`ll do it with the newly born Azure Synapse Serverless Spark Pool.
For details regarding Bicep & Synapse, please refer to earlier posts:
- Azure Synapse Analytics — A Game Changer (Coming Soon!)
- Bicep — Sustainable Coding!
Now let’s focus on the best approach in provisioning Synapse Spark, all-in-all while keeping Code-Quality in mind.
Before moving forward, please note that ‘Synapse Apache Spark Pool’ is also called/known as: ‘Serverless Spark Pool’, ‘Apache Spark Pool’, or even ‘BigData Pool’. (To avoid any possible confusion!)
Infra As Code — Bicep Approach
To be able to proceed and follow with the next steps, I recommend having the below prerequisites:
- An active Azure Subscription
- Bicep binary (Local or via Azure CloudShell)
- Azure PowerShell or Azure CLI (Local or via Azure CloudShell)
- A Resource Group and a Synapse Workspace already provisioned.
- Least-privileges to deploy/destroy resources in a Resource Group and within the target Azure Synapse Workspace.
Well settled!? Let’s do it!
Synapse SparkPool— Bicep Implementation
Azure Synapse Analytics is a complex service to deploy. Even with Bicep, this could lead to challenges in code maintainability.
The best approach will be to rely on Bicep modules. It is where we create reusable elements for each entity, such as but not limited to the Synapse Workspace, Storage, Compute resources, Networking, etc.
We start by creating Bicep Configuration files for Synapse Serverless Spark. The configuration file is designed with a Modular approach, to be either called directly or from within a ‘main’ bicep module.
Write Your Code
As briefly mentioned earlier, we are authoring a Bicep template that creates a Serverless Spark Compute while assuming that a Synapse Workspace has already been provisioned.
I highly recommend relying on a clear naming convention from Day Zero, even for temporary resources, including proper tagging.
Please refer to the Microsoft Azure naming conventions & recommendations to set proper naming for Azure resources.
FIVE easy steps to follow for a well-written Bicep code:
- Step 1: Use bicep decorators for Parameters/Input validation and description, as shown below (Even for Outputs)
@description(‘Required. Spark Pool Purpose. The length must not exceed 10 characters)
@maxLength(10)
param sparkPoolPurpose string = ‘train’
..
- Step 2: When required, use a variable. The below example is used mainly for formatting purposes (conditions could be applied on variables, too)
// Format the Spark Pool Name (Max length is 15 characters)
var sparkPoolName = ‘synsp-${sparkPoolPurpose}’
..
- Step3: Use bicep
resource ... existing = {}
to extract data from existing resources instead of hard coding them into variables:
// Get the existing Synapse Workspace (Used mainly for Output purposes)
resource synapseWorkspace 'Microsoft.Synapse/workspaces@2021-06-01' existing = {
name: synapseWorkspaceName
}
- Step4: Implement resource definition for Spark Pools deployment:
// Create Spark Pool Resource
resource sparkPool 'Microsoft.Synapse/workspaces/bigDataPools@2021-06-01' = {
parent: synapseWorkspace
name: sparkPoolName
...
}
- Step5: Provide sufficient output for further usage (Modules): (still missing decorators here:p)
output sparkPoolName string = sparkPool.name
output sparkPoolResourceId string = sparkPool.id
output synapseWorkspaceName string = synapseWorkspace.name
output devEndpoint string = synapseWorkspace.properties.connectivityEndpoints.dev
...
Please scroll down for the full ‘
main.bicep’
file. ⬇⬇
Trigger a bicep deployment
While we can provide all parameterized input when triggering the deployment of the resources, I recommend using a bicep parameters file.
Step a: Write the definition of the parameters files for the bicep:
Create the file default.parameters.json
, and update/adjust its content as below:
{
"$schema": https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#,
"contentVersion": "1.0.0.0",
"parameters": {
"synapseWorkspaceName": {
"value": "synw-demo-validation-dte"
},
"location": {
"value": "West Europe"
},
"sparkPoolPurpose": {
"value": "train"
},
"sparkNodeSizeFamily": {
"value": "MemoryOptimized"
},
"sparkNodeSize": {
"value": "Medium"
},
"sparkVersion": {
"value": "3.3"
},
"sparkAutoScaleEnabled": {
"value": true
},
"sparkAutoScaleMaxNodeCount": {
"value": 6
},
"sparkAutoScaleMinNodeCount": {
"value": 3
},
"sparkNodeCount": {
"value": 0
},
"sparkAutoPauseEnabled": {
"value": true
},
"sparkAutoPauseDelayInMinutes": {
"value": 5
},
"sparkDynamicExecutorEnabled": {
"value": true
},
"sparkMaxExecutorCount": {
"value": 3
},
"sparkMinExecutorCount": {
"value": 1
},
"sparkCacheSize": {
"value": 20
}
}
}
Step b: Trigger Synapse Spark deployment using Bicep configuration files:
$deploymentGenName = "ASASparkDeployment-((Get-Date).ToUniversalTime()).ToString('MMdd-HHmmssffff')"
# From an Azure DevOps pipeline use:
# $deploymentGenName = "ASASparkDeployment-$(Build.BuildID)"
New-AzResourceGroupDeployment -Name $deploymentGenName -c `
-ResourceGroupName rg-xyz `
-TemplateFile ./main.bicep `
-TemplateParameterFile ./default.parameters.json
Code Quality
Code Quality — Git Pre-Commit
I recommend going through an earlier blog post, ‘Git Pre-Commit — Zero-Trust My Code!’, for a ‘how to implement/use Git Pre-Commit’ capability to enforce regular and automatic code linting.
Code Quality — Bicep Linting & Validation
To be continued!
Source Code!—GitHub Gist/Repo
Below, I am sharing the whole Gist file and the Github Repo for the complete example.
To Conclude
It is unfortunate that no Bicep for Synapse Role Assignments exists yet.
You’re good to go! Thank you for reading me!