Azure Data Lake Command-Line access — Part 1
Emulate Unix or Cmd commands — ls, md, rm, mv on Azure Data Lake Storage G2 using PowerShell
Wouldn’t it be great to quickly explore data in ADLS from the command line Unix style?
In this post, I’ll run through a (fairly) simple PowerShell code to allow you to do just that.
Example — “CD” to Azure Data Lake G2 and create a new folder.
lls for “lake ls” and lmd for “lake md” so we don’t conflict with the local ls & md commands.
tl;dr
- You can set up PowerShell code and functions to use Unix like commands to quickly move around ADLS G2.
- You do need to perform a little set-up for each Lake instance but the commands like ls and mv are re-usable.
- Full code listing here.
- There is a troubleshooting section at the end of this post dealing with common problems.
At the end of this blog series, I’ll show a complete code listing with quick setup instructions.
Assumptions
I’m assuming AAD authentication and that you have permissions to access your storage, though you can set up other authentication mechanisms.
I’m also assuming the ADLS G2 filesystem is already in place and that we are using the same Azure tenant for all ADLS instances. ADLS G1 is not supported.
Step by Step set up
In this part 1 post, we’ll set up the basic PowerShell plumbing and get a “CD” command working to connect to ADLS as well as an “LS” function to allow us to browse our ADLS filesystem.
Add Azure Storage PowerShell module
The scripts use PowerShell Azure storage commands in the Azure Storage module, so the first step is to install the module by running the following command in a PowerShell Window.
Install-Module Az.Storage -Repository PSGallery -Force
This will give us the PowerShell storage commands we need and only needs to be done once.
Edit your PowerShell profile
To make commands available every time we start PowerShell, we add all the code in this post to our PowerShell profile (which is just a PowerShell script)
To see where your profile is just type $profile in a PowerShell Window
NOTE: the “Microsoft.PowerShell_profile.ps1" file may not exist if you have not set up a Psh profile before.
You can just run notepad $profile from PowerShell to edit or create if it doesn’t exist.
Or create\edit Microsoft.PowerShell_profile.ps1 in your favourite IDE.
OK, that’s the pre-requisites done, now to add the code…
Azure Tenant and Connect function
Firstly, we need to tell PowerShell the Azure Tenant we are connecting to so set the $tenantID variable on the first line of the code below.
We also create a connect function to handle checking & establishing an Azure connection.
Paste the code below into your PowerShell profile replacing xxxxxxx-xxx… with your tenantID. (This Microsoft Page explains how to get your tenantID)
$tenantID="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"function connect{
if (-Not (get-azcontext))
{
Connect-AzAccount -TenantId $tenantID
}
}
The code uses get-azcontext to check if we are connected to an Azure tenant & only calls Connect-AzAccount if we are not already connected.
“CD” Commands
I’ve used PowerShell functions prefixed with “CD” to move between ADLS Storage Accounts\File Systems.
IMPORTANT — You will need to set up a “CD” function for every ADLS filesystem you want to work with.
e.g., to connect to ADLS G2 in a storage account called “test” & a file system called “myfilesystem” add a new function to your profile like this:
function cd_test { # set to a friendly name for the ADLS filesystem you want to connect to
connect
$storageAccount = 'test' # Set to your Storage Account Name
$filesystem = 'myfilesystem' # Set to your ADLS File System Name
$global:ctx = New-AzStorageContext -StorageAccountName $storageAccount -UseConnectedAccount
$global:filesystemname = $filesystem
write-host "Storage Account: $storageAccount `nFileSystem: $global:filesystemname`n"
}
The code in the CD… function sets a storage context in the $global:ctx variable using the New-AzStorageContext command. It is used in subsequent PowerShell commands to specify the correct Storage Account for the command.
We also set a $filesystemname which is used in the same way to specify the correct file system in PowerShell commands.
You will need to copy this function into your PowerShell profile:
- Change “cd_test” to a name of your choice.
- Change $storageAccount to the name of your ADLS G2 Storage Account.
- Set $filesystem to the name of your ADLS G2 filesystem.
Below is a cd function I have used for a storage account “pshtest” and a filesystem called “pshcontainer”…
“LS” Command
Finally, we add a function for the ls functionality to our profile.
I’ve called it lls for “lake ls” as it needs a different name to the built-in ls command.
No changes here — just copy and paste into your PowerShell profile.
function lls([string]$path)
{
if($path){
Get-AzDataLakeGen2ChildItem -Context $ctx -FileSystem $filesystemname -Path $path | Format-Table -AutoSize
}
else{
Get-AzDataLakeGen2ChildItem -Context $ctx -FileSystem $filesystemname | Format-Table -AutoSize
}
}
Here we use Get-AzDataLakeGen2ChildItem to list files and folders in ADLS for the path passed in as $path. If no path is passed in we just list the contents of the root of the filesystem.
Note we use the $ctx and $filesystemname global variables here which we set in the connect routine above.
Format-Table simply ensures the output listing is formatted nicely.
Bringing it together and testing
You should now have some code very similar to the below in your Profile.
To test you will need to close and re-open your PowerShell window to load your updated profile.
If you get any syntax errors go back and check the code carefully for errors.
Run CD Command
We can now type the name of our cd function and connect to ADLS, PowerShell will prompt if we need to authenticate.
e.g., I called my cd function cd_pshtest:
The CD Command will return the Storage Account\Filesystem if successful.
Run LS Command
Now are are connected to an ADLS filesystem, lets test the LS command — or LLS for “LakeLS”
We can pass in a path to view sub folders, e.g. the “mike” subfolder:
Troubleshooting
- You must restart your PowerShell window to load your new profile.
- Authorisation errors may need you to manage the ACLs on your ADLS filesystem to give yourself Read\Write\Execute permissions.
- Running Clear-AzContext can be useful if you move between Tenants a lot & are having issues connecting.
- Get-AzDataLakeGen2ChildItem : This request is not authorized to perform this operation using this permission. You will need to manage the ACLs\RBAC on your ADLS filesystem to give yourself Read\Write\Execute permissions.
Finally
NOTE — the commands have no concept of a “current directory” so you must always use the full path to any commands.
Hopefully, this all works for you and is a nice, easy way to move around ADLS from a PowerShell command line.
In the next part of this post, I’ll expand functionality by adding commands for md, rm and mv as well as talking about other ways we might expand & improve these utilities.
About the Author:
Mike Knee is an Azure Data Developer here at Version 1.