Custom Backup Azure Data Lake Gen2 using Azure Data Factory

Haripraghash Subramaniam
5 min readNov 22, 2019

Azure Data Lake Store Gen2(ADLS) is highly scalable and secure analytics store on the Azure cloud. ADLS can store structured and unstructured data and it forms a core part of the analytics solution on Azure.

ADLS Gen2 is based off Azure storage. Therefore storage capacity is virtually limitless. Also, all the High availability features(GRS, RA-GRS etc) supported by Azure Storage is readily available for ADLS Gen2. This also means ADLS gen2 takes advantage of all the security lockdown features offered by Azure storage. Azure storage supports RBAC based resource access control and so does ADLS. Add to that, Access Control Lists(ACL) offer fine grained access control to files and directories.

ADLS gen2 is based on the hierarchical namespace feature of Azure Storage. Object storages like Azure Blob storages, historically have had virtual file path but not physically implemented filesystem. This makes is harder to query or iterate or move files within a particular path as this means interating over all the blobs. And at analytical workload scales, the latency of doing such operations becomes noticeable. Hierarchical namespaces in ADLS Gen2 introduces Directories and filesystem which helps the data to be organized within directories. This also helps to provide/restrict access at directory or file level.

ADLS Gen 2 does not support all the features offered by Azure Storage yet. This is listed here.

--

--