Azure Storage Blob Management

A simple CLI for the terminal comfortable

We have been using Azure to host several services and run data analysis for some time, and we are quite happy with what it offers (and having BizSpark credit doesn’t hurt). Being a terminal power user, commands are my bread and butter. So a well designed Command Line Interface (CLI) makes or breaks the flow. To me, this is where Azure really falls short. In particular, the general CLI is too bloated and isn’t consistent with the regular command line conventions. This hindrance hits hardest while transferring and managing data.

The way azure storage is structured is: accounts on the top most level, each account contains many containers, and each container is a key value storage of blobs. For those who are familiar with Amazon Web Services (AWS), this can be mapped to S3 where the bucket is the container.

To manage azure storage through the command line, the generalized azure CLI is used for all azure operations. There are many sub-commands, one for each kind of resource in Azure. There is the storage sub-command (azure storage ….), where you can manage storage accounts, handle containers separately, and manage the blobs themselves separately as well.

Let’s run through an example of how to list blobs inside a container:

  1. `azure storage account list` to list all the accounts.
  2. `azure storage account keys list account-name` to get the keys (this can be done alternatively from the azure web portal as shown later in this article)
  3. Set the `AZURE_STORAGE_ACCOUNT` and `AZURE_STORAGE_ACCESS_KEY` environmental variables, thus only allowing you to work with 1 account at a time.
  4. `azure storage container list` to get a list of containers
  5. `azure storage blob list container-name` to get a list of the blobs

To me, this all sounds too much to do a simple `ls` equivalent command.

Coming from an AWS background, dealing with files stored on S3 from the command line is a breeze. There is a single command s3cmd providing all your ls, mv, cp needs. The abstraction that is really awesome in S3 is that it hides the details and encapsulates everything into what looks like a file system, with files and folders.

Ideally in Azure blob storage, the file system scheme would look like this:

blob://storage account/container/blob path

So this deals with the top level directory as the account, which has containers, and then having “/”s in the blob path to fake further sub-directories. For example, a blob URI would look like this:

blob://kaggle-saher/springleaf/data/train.csv

Given this tree structure, we need a way to traverse it, and manage nodes in it.

Azure Storage CLI

To handle actions on the proposed structure, we created a command dedicated to Azure Storage, which can be found at: https://www.npmjs.com/package/azure-storage-cmd

The purpose of the CLI is to offer standard file traversal actions compatible with Azure Blob storage. The commands look as follows:

Usage: blob-cmd [options] [command]
Path URI Schemes:
  remote blob -> blob://account/container/path
  local path -> relative-path/subirectory/file.txt OR /absolute-path/..
Commands:
ls [URI]
cp|copy <from-URI> <to-URI>
rm|remove <file-URI>
mv|move <from-URI> <to-URI>
add-account <name> <key> [alias]
rm-account|remove-account <id>
Options:
-h, --help     output usage information
-V, --version  output the version number
-f, --force    force this action if <to> exists
-v, --verbose  Run in verbose mode

The first thing to do before using the command, you need to add each of the storage accounts that will be used (configuration is stored in ~/.azrblb.cfg). To add a storage account, you need to have the name of the account, and the key token for authentication. This can be accessed from the storage account properties on the azure portal.

To add an account, “Storage Account Name” and “Access Keys Key1” should be added to the config.

Once the account is added, you can start traversing its content.

blob-cmd ls

will show you all the containers under all the added accounts

blob-cmd ls blob://account/container

will show you all the files under that account.

When using copy, if one of the “from” or “to” parameters is a local path, it will upload or download respectively to this location from azure. So for example to upload blob:

blob-cmd cp local-file.txt blob://account/container/local-file.txt

and to fetch a blob

blob-cmd cp blob/account/container/another-file.txt another-file.txt

The current implementation is a start, and you can suggest more features, fixes, and improvements at: https://github.com/saherneklawy/azure-storage-cmd/issues


We are a team of data scientists and software engineers working to help enterprises makes the most out of their data. Our projects range from data analysis to extract insights, to predictive analytics to support decision making, to scalable production ready data products. Our focus areas are (1) personalization and (2) operational efficiency.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.