Breaking out of a ForEach activity in Azure Data Factory

Akash Gupta
Plumbers Of Data Science
4 min readSep 27, 2020

--

Problem

ForEach in ADF continues to iterate irrespective of the inner activity getting success or fail. Imagine a scenario in which one needs to break out if any of the internal activity fails.

Prerequisites

So as to accomplish this breaking out, we will require -

  1. An Azure Data Factory with rights to publish pipeline.
  2. Subscription Id and Resource Group Name of your Data Factory.
  3. The ADF-managed identity must be added to the Contributor role.

Solution

The ForEach Activity defines a repeating control flow in your pipeline. This activity is used to iterate over a collection and executes specified activities in a loop. The loop implementation of this activity is similar to the ForEach looping structure in programming languages.

But what if we want to break out if any inner activity fails.

One way to achieve this is to cancel the pipeline execution as soon as any inner activity fails.

Implementation

Our pipeline will consist of mainly two activities. The first is the LookUp activity and the Second is ForEach Activity.

Inside the ForEach Activity, we will be performing IF condition filtering based on the results obtained from the LookUp activity.

Step 1. Create a LookUp activity along with a pipeline variable named PipelineRunId.

Step 2. Create a ForEach activity with an IF condition inside it.

Step 3. Update source dataset in Lookup activity with any sample or test dataset.

From the query, we have used one can see that it will produce output as 1,2 and 3. We will use these values in our IF condition activity inside the ForEach activity to check if we want to break out of the ForEach or not. If 2 is encountered, the pipeline will get cancelled and it will break out from ForEach activity. For rest values, it will continue the execution of the pipeline.

Step 4. Join the pipelines and update items in the ForEach activity with the output value of the LookUp activity. It’s not necessary to run in sequential mode. You can use Batch mode also as per your need.

Step 5. Update the IF condition expression of ForEach activity with @equals(item().PersonID, 2). This checks if the PersonID equals 2 or not. In our case, if 2 is encountered we will be cancelling the pipeline else no activity is performed.

In order to get the pipeline, we will need two activities inside the True of IF Condition activity. The first is the Set Variable activity and the Second is the Web Activity.

Step 6. Create a Set Variable activity. Use the pipeline variable PipelineRunId we created in Step 1 as Name in the Variables tab.

Update the Value in Variables Tab with the following expression-@concat(‘https://management.azure.com/subscriptions/<My Subscription Id>/resourceGroups/<My Resource Group Name>/providers/Microsoft.DataFactory/factories/’,pipeline().DataFactory,’/pipelineruns/’,pipeline().RunId,’/cancel?api-version=2018–06–01')

Replace <My Subscription Id> with your actual Subscription Id and <My Resource Group Name> with your actual Resource Group Name.

The above expression mainly builds a call to the Cancel method of the Azure Data Factory REST API.

Step 7. Create a Web activity. Update the URL with the PipelineRunId variable value. Use POST as Method property with Integration Runtime as AutoResolve IR and Authentication of MSI type. Update Resource with https://management.azure.com and Body with any JSON format message like {“message”:”Cancelling the pipeline”}.

Step 8. Publish the pipeline.

Step 9. Click the Add Trigger > Trigger Now alternative to trigger the last-deployed version of the pipeline. On the off chance that all goes as arranged, the pipeline execution should cancel itself. On the ADF Monitor page, a Cancelled pipeline execution shows up as appeared below.

Note

  1. Azure Data Factory isn’t allowed to execute ADF REST API methods by default. The ADF-managed identity should initially be added to the Contributor role.
  2. When running in Debug, pipelines may not be cancelled. Pipelines must be triggered to be available to the REST API’s Pipeline Runs Cancel method. Both Manual and Scheduled triggers will work.

--

--

Akash Gupta
Plumbers Of Data Science

Data Engineering with a Sense of Humor: ओ bug कल आना!