Azure Data Lake Command-Line access — Part 1

Mike K
Version 1
Published in
6 min readMay 10, 2022

Emulate Unix or Cmd commands — ls, md, rm, mv on Azure Data Lake Storage G2 using PowerShell

Wouldn’t it be great to quickly explore data in ADLS from the command line Unix style?

In this post, I’ll run through a (fairly) simple PowerShell code to allow you to do just that.

Example — “CD” to Azure Data Lake G2 and create a new folder.

Creating a new folder in ADLS for my cat pictures…

lls for “lake ls” and lmd for “lake md” so we don’t conflict with the local ls & md commands.

tl;dr

  • You can set up PowerShell code and functions to use Unix like commands to quickly move around ADLS G2.
  • You do need to perform a little set-up for each Lake instance but the commands like ls and mv are re-usable.
  • Full code listing here.
  • There is a troubleshooting section at the end of this post dealing with common problems.

At the end of this blog series, I’ll show a complete code listing with quick setup instructions.

Assumptions

I’m assuming AAD authentication and that you have permissions to access your storage, though you can set up other authentication mechanisms.

I’m also assuming the ADLS G2 filesystem is already in place and that we are using the same Azure tenant for all ADLS instances. ADLS G1 is not supported.

Step by Step set up

In this part 1 post, we’ll set up the basic PowerShell plumbing and get a “CD” command working to connect to ADLS as well as an “LS” function to allow us to browse our ADLS filesystem.

Add Azure Storage PowerShell module

The scripts use PowerShell Azure storage commands in the Azure Storage module, so the first step is to install the module by running the following command in a PowerShell Window.

Install-Module Az.Storage -Repository PSGallery -Force

This will give us the PowerShell storage commands we need and only needs to be done once.

Edit your PowerShell profile

To make commands available every time we start PowerShell, we add all the code in this post to our PowerShell profile (which is just a PowerShell script)

To see where your profile is just type $profile in a PowerShell Window

My profile path….

NOTE: the “Microsoft.PowerShell_profile.ps1" file may not exist if you have not set up a Psh profile before.

You can just run notepad $profile from PowerShell to edit or create if it doesn’t exist.

Or create\edit Microsoft.PowerShell_profile.ps1 in your favourite IDE.

editing my PowerShell profile in VS Code

OK, that’s the pre-requisites done, now to add the code…

Azure Tenant and Connect function

Firstly, we need to tell PowerShell the Azure Tenant we are connecting to so set the $tenantID variable on the first line of the code below.

We also create a connect function to handle checking & establishing an Azure connection.

Paste the code below into your PowerShell profile replacing xxxxxxx-xxx… with your tenantID. (This Microsoft Page explains how to get your tenantID)

$tenantID="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"function connect{
if (-Not (get-azcontext))
{
Connect-AzAccount -TenantId $tenantID
}
}

The code uses get-azcontext to check if we are connected to an Azure tenant & only calls Connect-AzAccount if we are not already connected.

“CD” Commands

I’ve used PowerShell functions prefixed with “CD” to move between ADLS Storage Accounts\File Systems.

IMPORTANT — You will need to set up a “CD” function for every ADLS filesystem you want to work with.

e.g., to connect to ADLS G2 in a storage account called “test” & a file system called “myfilesystem” add a new function to your profile like this:

function cd_test { # set to a friendly name for the ADLS filesystem you want to connect to
connect
$storageAccount = 'test' # Set to your Storage Account Name
$filesystem = 'myfilesystem' # Set to your ADLS File System Name

$global:ctx = New-AzStorageContext -StorageAccountName $storageAccount -UseConnectedAccount
$global:filesystemname = $filesystem

write-host "Storage Account: $storageAccount `nFileSystem: $global:filesystemname`n"
}

The code in the CD… function sets a storage context in the $global:ctx variable using the New-AzStorageContext command. It is used in subsequent PowerShell commands to specify the correct Storage Account for the command.

We also set a $filesystemname which is used in the same way to specify the correct file system in PowerShell commands.

You will need to copy this function into your PowerShell profile:

  • Change “cd_test” to a name of your choice.
  • Change $storageAccount to the name of your ADLS G2 Storage Account.
  • Set $filesystem to the name of your ADLS G2 filesystem.

Below is a cd function I have used for a storage account “pshtest” and a filesystem called “pshcontainer”…

(See code here)

“LS” Command

Finally, we add a function for the ls functionality to our profile.

I’ve called it lls for “lake ls” as it needs a different name to the built-in ls command.

No changes here — just copy and paste into your PowerShell profile.

function lls([string]$path)
{
if($path){
Get-AzDataLakeGen2ChildItem -Context $ctx -FileSystem $filesystemname -Path $path | Format-Table -AutoSize
}
else{
Get-AzDataLakeGen2ChildItem -Context $ctx -FileSystem $filesystemname | Format-Table -AutoSize
}
}

Here we use Get-AzDataLakeGen2ChildItem to list files and folders in ADLS for the path passed in as $path. If no path is passed in we just list the contents of the root of the filesystem.

Note we use the $ctx and $filesystemname global variables here which we set in the connect routine above.

Format-Table simply ensures the output listing is formatted nicely.

Bringing it together and testing

You should now have some code very similar to the below in your Profile.

View\copy code here

To test you will need to close and re-open your PowerShell window to load your updated profile.

If you get any syntax errors go back and check the code carefully for errors.

Run CD Command

We can now type the name of our cd function and connect to ADLS, PowerShell will prompt if we need to authenticate.

e.g., I called my cd function cd_pshtest:

Connected! Type the name of YOUR cd function

The CD Command will return the Storage Account\Filesystem if successful.

Run LS Command

Now are are connected to an ADLS filesystem, lets test the LS command — or LLS for “LakeLS”

We can pass in a path to view sub folders, e.g. the “mike” subfolder:

Troubleshooting

  • You must restart your PowerShell window to load your new profile.
  • Authorisation errors may need you to manage the ACLs on your ADLS filesystem to give yourself Read\Write\Execute permissions.
  • Running Clear-AzContext can be useful if you move between Tenants a lot & are having issues connecting.
  • Get-AzDataLakeGen2ChildItem : This request is not authorized to perform this operation using this permission. You will need to manage the ACLs\RBAC on your ADLS filesystem to give yourself Read\Write\Execute permissions.

Finally

NOTE — the commands have no concept of a “current directory” so you must always use the full path to any commands.

Hopefully, this all works for you and is a nice, easy way to move around ADLS from a PowerShell command line.

In the next part of this post, I’ll expand functionality by adding commands for md, rm and mv as well as talking about other ways we might expand & improve these utilities.

About the Author:
Mike Knee is an Azure Data Developer here at Version 1.

--

--

Mike K
Version 1

I’m a computer nerd moving into the autumn of my career & keen to share the learnings, mistakes & triumphs of over 25 years in the technology industry.