How to Connect with Azure Data Lake Blob Storage Using Mule4

--

Azure Data Lake is a scalable data storage and analytics service hosted in Azure — Microsoft’s Public Cloud.

Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists and analysts to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages. This can be used to store any kind of data such as un-structured, semi-structured and structured data.

This guide will help you understand how to create Azure Trial Account, and create Storage Account, Resource Group, access the blob storage in Microsoft Azure Storage Explorer and perform different operations from MuleSoft with Azure.

Create Azure Trail Account:

  • Create a new Hotmail/Outlook account.
  • Go to Azure Portal (https://portal.azure.com) and login with either Hotmail or Outlook credentials.
  • Click on Start to start with an Azure free trial account.
  • Provide and fill-in your details and also need to provide card details to authenticate for a free trial account.
  • Click on create a resource, search for storage account and create it.
  • Select the subscription as Free Trail, create new resource group, provide the storage account name, review and create new storage account.
  • Go to Home and should be able to see the new storage account and resource group is created.
  • Click on the azuredbstorageuds storage account and click on Open in Explorer.
  • Download the Azure Storage Explorer, if it is not downloaded. Else open the Azure Storage Explorer. This is also other way to view the storage container data.
  • Click on Sign in with Azure, provide your credentials and login.
  • The storage account container and folder details can be accessible, if they are created.
  • Please see below on how to connect and access the Azure Data Lake Blob Storage account.

Prerequisites:

  • Microsoft Azure Trail Account
  • MuleSoft Anypoint Studio
  • Download Azure Data Lake Storage Connector

Installation:

To use the Azure Data Lake Storage Connector, Search for Azure Data Lake Storage Connector in Exchange, Add it to the modules and Finish. This will automatically download and add dependency within the project to use.

The Azure Data Lake Storage module shown in the Mule Palette to perform different kind of operations in Azure.

Connector Configuration:

  1. Storage Account Name is the storage account which is created as mentioned above.
  2. Provide the DNS Suffix name as dfs.core.windows.net.
  3. For SAS Token Generation, Go to the Azure Portal and Click on the shared access signature under the security + networking.

4. Check on all the Allowed resource types such as Service, Container and Object, scroll down and Click on Generate SAS and connection String to generate the SAS Token.

Demo:

  1. Create File System:

This connector is used to create new folder/directory inside the Azure Data Lake storage container.

. Go to the Azure Storage Account and click on containers and check to see the new folder test-data-load in created.

2. Create/Rename Path:

This connector is used to create or rename a file/folder(directory) inside the Azure Data Lake storage container.

The resource attribute is drop-down to select whether to create a file or directory within the specified path.

Go to the Azure Storage Account, navigate inside the test-data-load directory and check to see a new file with sample_test.json is created.

Change the file system and path inside the postman to change/rename the folder and file name accordingly.

3. Delete Path:

This connector is used to delete folder/directory from the Azure storage account.

Go to the Azure Storage Account and check to see the folder test-data-load is removed from the Azure Storage account.

4. Read Path:

This connector is used to read the file contents from specified folder in Azure storage account.

There is directory test-data-load which contains data in csv format.

5. List File Systems:

This connector is used to list all the folders/directories within the Azure storage container.

6. List File Paths:

This connector is used to list all the folders/directories within the Azure storage container.

<?xml version="1.0" encoding="UTF-8"?>

<mule xmlns:ee="http://www.mulesoft.org/schema/mule/ee/core" xmlns:azure-data-lake-storage="http://www.mulesoft.org/schema/mule/azure-data-lake-storage"
xmlns:http="http://www.mulesoft.org/schema/mule/http"
xmlns="http://www.mulesoft.org/schema/mule/core" xmlns:doc="http://www.mulesoft.org/schema/mule/documentation" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mulesoft.org/schema/mule/core http://www.mulesoft.org/schema/mule/core/current/mule.xsd
http://www.mulesoft.org/schema/mule/http http://www.mulesoft.org/schema/mule/http/current/mule-http.xsd
http://www.mulesoft.org/schema/mule/azure-data-lake-storage http://www.mulesoft.org/schema/mule/azure-data-lake-storage/current/mule-azure-data-lake-storage.xsd
http://www.mulesoft.org/schema/mule/ee/core http://www.mulesoft.org/schema/mule/ee/core/current/mule-ee.xsd">
<http:listener-config name="HTTP_Listener_config" doc:name="HTTP Listener config" doc:id="76d1f96c-f432-4645-9899-4ec62d8ceea7" >
<http:listener-connection host="0.0.0.0" port="8081" />
</http:listener-config>
<azure-data-lake-storage:config name="Azure_Data_Lake_Storage_Connector_Config" doc:name="Azure Data Lake Storage Connector Config" doc:id="4992883c-ae1e-48d6-a38d-f3e1cd98e7ac" >
<azure-data-lake-storage:shared-access-signature-connection accountName="azuredbstorageuds" dnsSuffix="dfs.core.windows.net" sasToken="?sv=2021-06-08&amp;ss=bfqt&amp;srt=sco&amp;sp=rwdlacupyx&amp;se=2023-03-31T13:17:14Z&amp;st=2023-02-20T05:17:14Z&amp;spr=https&amp;sig=sqrZurzj9BFFKGka8LDRWDEB4N9AgmGgCRBtm8kEakE%3D" />
</azure-data-lake-storage:config>
<azure-data-lake-storage:config name="Azure_Latest" doc:name="Azure Data Lake Storage Connector Config" doc:id="1d1d06bf-6d9d-4379-9de5-5a0741be6976" >
<azure-data-lake-storage:shared-access-signature-connection accountName="dbstorageda22d902492adls" dnsSuffix="dfs.core.windows.net" sasToken="?sp=rl&amp;st=2023-02-16T06:12:55Z&amp;se=2023-05-16T14:12:55Z&amp;spr=https&amp;sv=2021-06-08&amp;sr=d&amp;sig=TypnRZXLHmJ0DCu4tZXAWrwDNUhWSYLEhwAHQr%2FYonA%3D&amp;sdd=3" />
</azure-data-lake-storage:config>
<azure-data-lake-storage:config name="Azure_Data_Lake_Storage_Connector_Folder" doc:name="Azure Data Lake Storage Connector Config" doc:id="6cab5933-1024-4d32-b358-4e9b094063f0" >
<azure-data-lake-storage:shared-access-signature-connection accountName="azuredbstorageuds" dnsSuffix="dfs.core.windows.net" sasToken="?sp=r&amp;st=2023-02-20T14:16:25Z&amp;se=2023-02-20T22:16:25Z&amp;spr=https&amp;sv=2021-06-08&amp;sr=b&amp;sig=usSjGVkX5999i62XGTnnkqd9cmWjxONqs0nuuBeY0%2Bw%3D" />
</azure-data-lake-storage:config>
<azure-data-lake-storage:config name="Azure_Data_Lake_Storage_Connector_Container" doc:name="Azure Data Lake Storage Connector Config" doc:id="c8346acf-357f-4163-99ca-f0f5b5503284" >
<azure-data-lake-storage:shared-access-signature-connection accountName="azuredbstorageuds" dnsSuffix="dfs.core.windows.net" sasToken="sp=rawd&amp;st=2023-02-22T10:41:19Z&amp;se=2023-02-22T18:41:19Z&amp;spr=https&amp;sv=2021-06-08&amp;sr=c&amp;sig=ccVVqAs%2B3jFh4J%2B9fkk%2BUcCMDv61xUc9qMT7rKBp5cA%3D" />
</azure-data-lake-storage:config>
<flow name="azure-data-lake-createFileSystem" doc:id="f6d8ee85-4ead-4753-a099-42d6e1020797" >
<http:listener doc:name="Listener" doc:id="35f272b8-2fdc-4f96-9b86-5e65638840a7" config-ref="HTTP_Listener_config" path="/createFileSystem"/>
<set-variable value="#[attributes.headers.'fileSystem']" doc:name="Create File System" doc:id="5de26414-2972-494f-ba8f-1be26d3c76b9" variableName="fileSystem"/>
<azure-data-lake-storage:create-file-system doc:name="Create File System" doc:id="80204456-c038-43f8-8875-bf8ee2f12e75" filesystem="#[vars.fileSystem]" timeout="60" config-ref="Azure_Data_Lake_Storage_Connector_Config"/>
<ee:transform doc:name="Response" doc:id="736afd32-02f8-44d8-a59d-00ef867b6e20" >
<ee:message >
<ee:set-payload ><![CDATA[%dw 2.0
output application/json
---
payload]]></ee:set-payload>
</ee:message>
</ee:transform>
</flow>
<flow name="azure-data-lake-deleteFileSystem" doc:id="4e0f080e-f1a8-4ed3-96f0-57a66b55514e" >
<http:listener doc:name="Listener" doc:id="57b72c8f-a7d5-453a-a1ec-c2dd1f77c517" config-ref="HTTP_Listener_config" path="/deleteFileSystem"/>
<set-variable value="#[attributes.headers.'fileSystem']" doc:name="Delete File System" doc:id="29bda77b-fe65-4902-bf8c-d98b14bcd90c" variableName="fileSystem"/>
<azure-data-lake-storage:delete-file-system doc:name="Delete File System" doc:id="34892c27-7c24-483d-a274-800e2e2f58b8" config-ref="Azure_Data_Lake_Storage_Connector_Config" filesystem="#[vars.fileSystem]" timeout="60"/>
<ee:transform doc:name="Response" doc:id="61df898a-e43d-4a14-968b-7ad8dddace66" >
<ee:message >
<ee:set-payload ><![CDATA[%dw 2.0
output application/json
---
payload]]></ee:set-payload>
</ee:message>
</ee:transform>
</flow>
<flow name="azure-data-lake-createPath-DirectoryorFile" doc:id="8b813fa7-7bde-477a-b936-1e0410e435ff" >
<http:listener doc:name="Listener" doc:id="d563184f-515c-49dd-bef4-991e06e7ae53" config-ref="HTTP_Listener_config" path="/createPath"/>
<set-variable value="#[attributes.headers.'fileSystem']" doc:name="Set File System" doc:id="f9c1fa2f-0b05-483b-9458-4926cf9a8b33" variableName="fileSystem"/>
<set-variable value="#[attributes.headers.'path']" doc:name="Set Path" doc:id="29c0b628-2c11-4d5d-bd3e-a4150bb658bd" variableName="path"/>
<azure-data-lake-storage:create-or-rename doc:name="Create/Rename Path" doc:id="6eff359f-7c72-4abd-b0a1-f0cd6fd63398" config-ref="Azure_Data_Lake_Storage_Connector_Config" resource="file" fileSystem="#[vars.fileSystem]" path="#[vars.path]"/>
<ee:transform doc:name="Response" doc:id="9fe88957-de84-4cd0-815a-cf347cb86960" >
<ee:message >
<ee:set-payload ><![CDATA[%dw 2.0
output application/json skipNullOn="everywhere"
---
payload]]></ee:set-payload>
</ee:message>
</ee:transform>
</flow>
<flow name="azure-data-lake-readPath" doc:id="e5e9c37c-646b-4bcd-aaab-b1c8c6d081a4" >
<http:listener doc:name="Listener" doc:id="24329b53-a2f9-4eda-83fe-f405ed1c05bf" config-ref="HTTP_Listener_config" path="/readPath"/>
<set-variable value="#[attributes.headers.'fileSystem']" doc:name="Set File System" doc:id="5bcc1cf0-167b-41d0-8809-62a9db1b41e0" variableName="fileSystem"/>
<set-variable value="#[attributes.headers.'path']" doc:name="Set Path" doc:id="e75026c0-2633-49f0-91d2-09ddbec19ab2" variableName="path"/>
<azure-data-lake-storage:read-path doc:name="Read Path" doc:id="69d1c4e4-8429-486a-ac4b-3925d96eee28" config-ref="Azure_Data_Lake_Storage_Connector_Config" fileSystem="#[vars.fileSystem]" path="#[vars.path]"/>
<ee:transform doc:name="Response" doc:id="704b650d-8154-4e1e-8999-e51b7b18f77b" >
<ee:message >
<ee:set-payload ><![CDATA[%dw 2.0
output application/json skipNullOn="everywhere"
---
payload filter ((item, index) -> item.StoreCode != "")]]></ee:set-payload>
</ee:message>
</ee:transform>

</flow>
<flow name="azure-data-lake-listFileSystems" doc:id="02ef3cd3-9eaf-4666-8480-9e825cc59e47" >
<http:listener doc:name="Listener" doc:id="8abf46d1-4cec-42a7-9b26-1e55bd5f76d8" config-ref="HTTP_Listener_config" path="/listFileSystems"/>
<azure-data-lake-storage:list-file-systems doc:name="List File Systems" doc:id="bfbcbabc-ed4a-4f2e-be02-ab9e40bf18c1" config-ref="Azure_Data_Lake_Storage_Connector_Config"/>
<ee:transform doc:name="Response" doc:id="2e448f96-e3bb-458b-a046-1147b63c5a21" >
<ee:message >
<ee:set-payload ><![CDATA[%dw 2.0
output application/json
---
payload]]></ee:set-payload>
</ee:message>
</ee:transform>
</flow>
<flow name="azure-data-lake-listPath" doc:id="d7ea5214-0b2a-4b01-9e6b-e98a11c2080d" >
<http:listener doc:name="Listener" doc:id="890047a3-85f6-4a08-bbff-c7c80b860152" config-ref="HTTP_Listener_config" path="/listPaths"/>
<set-variable value="#[attributes.headers.'fileSystem']" doc:name="Set File System" doc:id="4fae0de6-46e4-41c2-9983-444cac245646" variableName="fileSystem"/>
<azure-data-lake-storage:list-paths doc:name="List Paths" doc:id="e15fc5ec-849f-4c61-b45c-84b66280be26" config-ref="Azure_Data_Lake_Storage_Connector_Config" filesystem="#[vars.fileSystem]"/>
<ee:transform doc:name="Response" doc:id="8a5cc87f-958d-499a-8524-81486d2ed7a3" >
<ee:message >
<ee:set-payload ><![CDATA[%dw 2.0
output application/json
---
payload]]></ee:set-payload>
</ee:message>
</ee:transform>
</flow>
</mule>

Conclusion:

In this article, we learned how to:

  • Create Azure Trail Account and create resource group and storage account inside the Azure account.
  • Connect to Azure Storage Account from MuleSoft using Azure Data Lake connector and perform different operations on the Azure Storage Container.

Thanks for reading my post and I hope it will be helpful.

— Pradeep Kumar Reddy Yerramreddy

--

--