Let’s Azure: Creating Azure Data Lake Storage Gen2— with Azure Portal

TechFarm by Shahz
Let’s Azure
Published in
5 min readSep 3, 2021

Azure Data Lake Storage Gen2 is quite unlike its predecessor Azure Data Lake Storage Gen2. ADLS Gen2 is build on top of Azure Blob Storage and thus comes with the inherent benefits of a Azure Blob Storage such as cost and reliability. ADLS Gen2 is deigned for enterprise big data analytics and it is equipped with feature of the Gen1 version like data organization, security, semantics and scalability.

ADLS Gen2 is designed to manage massive amounts of data — store and serve multiple exabytes of data while sustaining hundreds of gigabits of throughput.

Creating ADLS Gen2 in Azure Portal

First of all login to you Azure Portal. in the landing page click on the + (plus) sign of the Create a resource link.

This will take you to the Azure marketplace to select the resource type. In the marketplace search for Azure Data Lake Storage Gen2. Interestingly you will notice that the search suggestions and search result show only Data Lake Storage Gen1 and there is no option for Data Lake Storage Gen2.

Search suggestions and search result only show Data Lake Storage Gen1

So, where is Data Lake Storage Gen2 ? This is because the ADSL Gen2 is built on top of Azure Blob Storage, we don’t see it here in marketplace. Actually we need to search for storage account here and select the storage account from search result.

Click on Create button and it will take us to the Storage Account Creation page. Here in basic tab, fill up everything as in Azure Blob Storage. Select a subscription (assuming you already have a subscription).

Select an existing Resource Group or create a new resource group using Create new link.

Give the ADSL account an unique name using your naming convention

Select a geo-region and the required redundancy, and performance aspect.

For non-production systems use Standard and Local options.

Now move to the Advanced tab and this is where the ADLS Gen2 differs from regular blob storage.

First part here is related to security and access control. Depending the security requirement select the appropriate boxes.

Next is DLS Gen2. Select Enable Hierarchical Namespace here. This will the turn on the ADLSGen2. This option differs the ADLSGen2 from regulars blob storage.

Next Select Hot or Cool option depending on the frequency of usage of this data.

Click on Next and Configure Networking Tab. Then configure the Data Protection tab.

Once all the configurations are done, Review the configurations in Review + Create tab. Click on the Create button. It will start the deployment which may take few minutes to complete.

Click on Go to resource and here we are !!! Our ADLSGen2 storage is created and deployed !

But we aren’t done yet. We need to setup access grants as well as directory structure for data organization. But before we go into details of those, let’s quickly understand how this Hierarchical Namespace makes so much of difference.

Hierarchical Namespace

Hierarchical Namespace is the key differentiator between data storage and blob storage. Enabling the Hierarchical Namespace converts blob storage into data storage.

hierarchical tree structure of folders, subfolders and files

Hierarchical Namespace essentially means the collection of files (or objects) are organized in a hierarchical tree structure of folders, subfolders and files — much like how we organize them in our computer.

On the contrary, the blob storage is a flat structure. We often organized blobs to simulate a tree-like hierarchical structure that seems to include folders and subfolders. However, that is simply a naming convention with slashes in blob names, but they are really just files in a flat structure. While performing operations like list, move, rename or delete , this slashes like structure doesn’t help because without real hierarchical directories applications have to scan through potentially millions of individual blobs to achieve the task.

And this is where the hierarchical namespace feature significantly improves performance of analytics jobs. For performing these operations it directly target a specific entry and update it. This performance improvement means less computing power to process the same amount of data, which in turns lowers total cost of end-to-end analytics jobs.

Back to our ADLSGen2 storage creation

Now that we have our ADLSGen2 storage ready, next step is providing access. As we will be using applications to manage the files and folders, we need to provide access AAD Service Principal. Applications can connect to storage using the Service Principal credentials.

Managing ADSLGen2

Now that the storage is available, next step is to organize and manage the storage. This includes creating, updating, copying, moving and organizing directories and files. These can be done either through Azure Portal or using an application.

Isn’t it interesting. If you like this and wanna read more on Azure, follow Let’s Azure and also click here to Follow this author.

If this story is helpful for you forward to your friends and if you have suggestions, do let us know your thoughts in comments.

Happy Azuring and Happy Coding !!!!

--

--

TechFarm by Shahz
Let’s Azure

Passionate Enterprise Architect | GenAI Expert | Cloud Architect | Digital Transformation Strategist | Blockchain Enthusiast | Learning Leader