Azure Data Lake Storage — ACLs for enhanced security
Access Control Lists (ACLs) in ADLS provide fine-grained, scalable security for managing data access, completing RBACs, and enhancing permissions controls at the directory and file level.
Managing access to large datasets is a critical concern in today’s data-driven world. With data lakes becoming the backbone of many big data solutions, ensuring the right people have access is paramount. Azure Data Lake Storage (ADLS), a powerful data storage solution, enforces security with an additional mechanism through Access Control Lists (ACLs).
In this post, we’ll scour what ACLs are, the challenges they address, and how to effectively manage them in Azure Data Lake Storage using AzCLI.
Data Lake Access Management — Challenges
Data lakes, by design, are vast repositories stowing vast portions of structured and unstructured data at any scale. While their flexibility and scalability are practical, they also pose noteworthy challenges when it comes to access control:
- Limits of Azure RBAC: It is not possible to use RBACs to allow/deny access to a specific file/directory in a Blob Container (aka. File System in a Data Lake context)
- Complex Permissions: Handling permissions across millions of files for many users can be overwhelming.
- Granular Access Requirements: Teams often require access to specific datasets rather than others. For instance, Team A might need to process customer data but should not see internal financial records. Without proper controls, sensitive data can be inadvertently exposed.
- Dynamic Environments: Data lakes are dynamic, with data being ingested, processed, and modified constantly. Preserving consistent access controls becomes increasingly demanding as new data is added and existing data is updated.
How ACLs Address These Challenges
Access Control Lists (ACLs) in Azure Data Lake Storage offer solutions to the above-described challenges by providing fine-grained access control over files and directories.
Unlike traditional role-based access control (RBAC), which operates at a higher level, ACLs allows to arrange permissions at the directory and file level, giving precise control over who can access specific data.
Here’s how ACLs can help:
- Fine-Grained Control: ACLs set specific permissions for users or groups at the file and directory level.
- Inheritance: Permissions set at the directory level can be inherited by all subdirectories and files, simplifying the management of access controls as the data structure grows.
- Scalability: ACLs are designed to handle large-scale environments. One can efficiently manage permissions for thousands of users and services across massive datasets, ensuring access controls’ security and scalability.
Note: While ACLs provide granular controls over an ADLS, however, they can not be used at the container level; container-level access control is managed through RBACs and shared access signatures (SAS).
Creating and Managing ACLs in ADLS
Creating and managing ACLs in ADLS is a straightforward process, but it’s essential to understand:
- An ACL is a set of ACEs (Access Control Entries).
- Each ACE is an association of a Blob Item to an Identity.
- There are differences between creating a new ACL and updating an existing one.
- ACLs are set independently on every object (file/directory) inside the Data Lake, so each object has its own set of ACLs.
The above two points could make the management of ACLs a little bit more challenging than expected!
Creating a New ACL
Setting up an ACL on a file or directory defines who has access and at what level.
Here’s a how to create a new ACL via Azure CLI:
Set ACLs using the az storage fs access
command:
az storage fs access set --acl "user::rwx,group::r-x,other::---,user:<user-oid>:rwx,group:<group-oid>:r-x" --path <path-to-directory-or-file> --account-name <storage-account-name> --file-system <file-system-name>
The above command sets Read, Write, and eXecute permissions for the Owner, Read and eXecute for the Group, and no permissions for Others. Additionally, it grants specific permissions to other users or groups.
Verify ACL settings: After setting the ACLs, one can verify them using the
get
command to ensure accuracy.
ACLs are limited to 32 for each entity (File/Directory) in Data Lake, and which should be considered during design.
Updating an Existing ACL
Updating an existing ACL involves modifying the permissions for users or groups that already have access or adding/removing permissions for new users or groups. The process is similar to creating a new ACL but focuses on adjusting existing settings:
- Retrieve existing ACLs: Before making changes, it’s crucial to understand the current state of ACLs. Use the
az storage fs access show
command to list current ACLs. - Modify ACLs: It is about adding, updating, or removing specific permissions by adjusting the ACL string. For example, to grant additional write permissions to a user:
az storage fs access update --acl "user:<user-oid>:rw-" --path <path-to-directory-or-file> --account-name <storage-account-name> --file-system <file-system-name>
Propagate changes if necessary when willing to apply the permissions on all subdirectories and files.
Powershell in a nutshell!
It is a quite different approach with PowerShell; You need a few more commands to update an existing ACL association.
- Init Variables:
$subscriptionId = "GUID of the AzSubscription"
$dataLakeStoreName = "stinferencedevsw1"
$fileSystemName = "inference"
$aclLevel = "file"
$itemPath = "maindir/subdir/pic.png"
$entraIdGroupId = "GUID of the Entra ID Group"
$aclPermissions = "rw-"
Set-AzAccount -Subscription $subscriptionId
2. The processing script looks like the below:
Write-Host "Retrieve Context for DataLake '$dataLakeStoreName'"
$dlContext= New-AzStorageContext -StorageAccountName $dataLakeStoreName -UseConnectedAccount
Write-Host "Retrieve origin ACL for $aclLevel '$itemPath'"
$itemObject = Get-AzDataLakeGen2Item -FileSystem $fileSystemName -Path $itemPath -Context $dlContext
$originAcl = $itemObject.ACL
Write-Host "Update ACL for $aclLevel '$itemPath' (Permissions: $aclPermissions)"
$updatedAcl = Set-AzDataLakeGen2ItemAclObject -AccessControlType group -EntityId $entraIdGroupId -Permission $aclPermissions -InputObject $originAcl
Update-AzDataLakeGen2Item -FileSystem $fileSystemName -Path $itemPath -ACL $updatedAcl -Context $dlContext
Conclusion
Azure Data Lake Storage, with its ACL functionality, offers robust, granular control over data access. By effectively managing ACLs, one can address the complex challenges of securing and organizing a data lake, ensuring the organization’s data remains protected and accessible to the right users.
Whether setting up ACLs for the first time or managing an existing environment, understanding how to create and update ACLs is key to maintaining a secure and efficient data lake. With these tools, one can confidently streamline access management and safeguard valuable data assets.