A Gentle Introduction to Azure Security Best Practices

Ahmed Hassen
8 min readOct 14, 2022

--

This Blog summarizes some of the main requirements related to security for Microsoft Azure Software Developers and especially Data Engineers

Introduction

Security aspects remains one of the toughest subjects to handle inside an organization these days. As cloud technology is constantly emerging in the IT market, companies are leaning towards the security policies defined by their cloud providers to assist them on facing the upcoming challenges. From which we can mention, managing heavy workloads defined by the organization users such as services, applications and solutions as well as the attack risks and vulnerable deployments. Not forgetting, of course, the fact that we are lacking experienced cloud administrators which can implement the best security protocols.

As a Software engineer, leveraging the best cloud security practices is considered as a huge advantage and can offer many benefits to your workload deployments. Which is why I m sharing some of the best practices related to security I’ve came across while preparing for the Azure Data Engineer Certificate. As a Software Engineer working with Azure portal, you should deploy your workloads on multiple Azure Services that may include storage, databases, Network and many others. Linking all these together requires setting up security rules in order to secure your authentication

Authentication Methods

This blog will come in handy when it comes to define the best authentication method to access your Azure service in a secure way

Azure Active Directory

What is Azure Active directory? As described by the Microsoft, Azure AD is a cloud-based identity and access management service, it is highly scalable and distributed across the world through the Azure Cloud. It is a live directory or a database that stores user accounts and their passwords, computers, files shares, security groups, permissions and so much more.

Down here I have noted a list of hints regarding the use of AAD:

  • Azure AD does not support Azure Files (REST) or Azure Tables
  • Use Active Directory to authorize resource management operations such as configuration.
  • Managed identity for Azure resources is a feature of Azure Active Directory
  • Azure AD supports Multi-Factor Authentication
  • Azure AD authentication uses contained database users to authenticate identities at a database level
  • Finally, Active Directory is supported for data operations on blob and queue Storage

Shared Access Signature

A shared access signature (SAS) provides secure delegated access to resources in your storage account. as a best practice, you shouldn’t share storage account keys with external third-party applications and untrusted client, but you can use SAS instead. SAS comes with three types: User delegation SAS, Service SAS and Account SAS

  • For untrusted clients, use a shared access signature or SAS.
  • The SAS is a string that contains a security token that can be attached to a URI.
  • You use an SAS to delegate access to storage objects and specify constraints such as the permissions and the time range of access.
  • By default, storage accounts accept connections from clients on any network.
  • Does not support Azure Files (SMB) and does not support RBAC

Shared Key (Storage account key)

One other option to authorize a request against a private storage service can be done through a Shared key

  • Supports Azure Blob, Azure Files (SMB), Azure Files (REST), Azure Queue, Azure Tables
  • The RBAC is not supported

Azure Services

In this section we are exploring Do’s and Don’ts for some of the main services that data engineers may work on to deploy their workloads. To guarantee a useful content, I have limited sections to only cover data related services like storage and data processing

Azure Data Lake Gen2

ADLS Gen2 supports Azure RBAC (role-based access control) and Posix like access control lists (ACLs). The Azure Data Lake Storage Gen2 connector supports the following authentication types:

✑ Account key authentication

✑ Service principal authentication

✑ Managed identities for Azure resources authentication

  • Managed identity authentication is required when your storage account is attached to a VNET
  • You can directly use this managed identity for Data Lake Storage Gen2 authentication like using your own service principal. It allows this designated factory to access and copy data to or from your Data Lake Storage Gen2
  • A shared access signature (SAS) provides secure delegated access to resources in your storage account. But ACLs is not supported
  • With a SAS, you have granular control over how a client can access your data.

Azure Data Factory

When it comes to Azure Data Factory. We can use managed identities to authenticate. What are managed identities? Managed identities provide an automatically managed identity in Azure Active Directory for applications to use when connecting to resources that support Azure Active Directory (Azure AD) authentication. If you want to build an application using an Azure resource that can access another service without needing to manage credentials, then we can use Managed identities

  • A data factory can be associated with a managed identity for Azure resources, which represents this specific data factory.
  • You can directly use this managed identity for Data Lake Storage Gen2 authentication like using your own service principal. It allows this designated factory to access and copy data to or from your Data Lake Storage Gen2

Azure Synapse Analytics

Azure Synapse is a limitless analytics service that brings together enterprise data warehousing (dedicated SQL pools) and Big Data analytics.

Azure Synapse Analytics uses a combination of authentication requests with the conditional access option provided by Azure AD premium. This feature provides a policy-based mechanism for decisions like block access, MFA, compliant device or force a password change.

What is not obvious is that there are authentication processes that acts in the background which may occur when services are working together seamlessly:

  • Azure Synapse Spark or SQL pool to access data in an Azure Data Lake
  • A user wants to view reports in Power BI dashboard that is being serviced by a dedicated SQL pool. Here, you have multiple levels of authentication taking place under the hood

When it comes to authentication, consider the following guidance:

  • Azure AD authentication uses contained database users to authenticate identities at the database level.
  • You can use the managed identity capabilities to authenticate to any service that supports Azure Active Directory authentication
  • The managed identity lifecycle is directly tied to the Azure Synapse workspace. If you delete the Azure Synapse workspace, the managed identity is also cleaned up
  • For user accounts that are not part of an Azure Active Directory, then using SQL authentication will be an alternative
  • If a user and an instance of Azure Synapse are part of the same Azure Active Directory, it is possible for the user to access Azure Synapse Analytics without an apparent login
  • For bulk loading of data through COPY statement into Synapse SQL through Azure Blob storage, supported authentication types are Shared Access Signature (SAS) and key/secret is storage account key while the file type is Parquet.
  • We can register the synapse workspace with system-managed identity if the storage account is provisioned with a VNET

Azure Databricks

Users access Azure Databricks workspace with an Azure Active Directory account. The account must be added to the Databricks workspace before they can access it. The resource provider front end checks a user’s authorization against an Azure Active Directory tenant. Azure Databricks also uses SCIM to manage access controls to users.

Additionally, we can use Azure AD generated tokens to automate provisioning of Azure Databricks workspace in order to access the Databricks REST API, but we are limited to 600 tokens per workspace

The following features gives a better understanding of the security requirements while using Azure Databricks

  • RBAC: Control access to clusters, jobs, data tables, APIs, and workspace resources such as notebooks, folders, jobs, and registered models.
  • VNET Injection: Deploy an Azure Databricks workspace in your own Virtual network comes with more benefits such as connecting to other Azure services like storage and private endpoints, use a custom DNS and take advantage of the user-defined routes.
  • Cluster connectivity: Enable secure cluster connectivity on the workspace, to allow only private IP addresses for the cluster nodes. Also known as “No Public IPs.”

One important thing to mention, is the automatic authentication to Azure Data Lake Storage Gen1 and 2 from Azure Databricks by enabling the feature credentials passthrough

Apart from Azure Key Vault — Backed secret scope, Databricks-backed secret scope can also be used to store secrets. It’s an encrypted database owned and managed by Azure Databricks. The secret scope name should be unique within a specific workspace.

Conclusion

That was a lot to take! Let’s be honest, Security is one of the heaviest tasks to handle. But if you did make it so far, you should be familiar now with the basic aspects of the security protocols and its requirements. Here some of the takeaways:

  • An application that accesses a storage account when network rules are in effect still requires proper authorization for the request. Authorization is supported with Azure Active Directory (Azure AD) credentials for blobs and queues, with a valid account access key, or with an SAS token.
  • A shared access signature (SAS) is a URI that grants restricted access rights to Azure Storage resources. You can provide a shared access signature to clients who should not be trusted with your storage account key but to whom you wish to delegate access to certain storage account resources
  • Microsoft Defender for storage account can be enabled for Blob, File & ADLS gen2 only when they are in GA.

As you may noticed, I didn’t cover all the Azure services such as Azure ML or Azure Web Apps etc. I wanted to focus on the services mostly used by the data engineers.

I’m glad to share my second blog for medium. The journey has just begun for me, and I feel so grateful to share my expertise on the security subject. If you liked this article follow me on Linkedin and Github and don’t miss my fist blog on “How to Ace your DP-100 Certification”

References

Get the best from the following useful links for a better understanding of the security protocols defined by Microsoft Azure

Azure Security Best Practices and Patters

Azure Security Best Practices by Ryan Ewert

Top 10 Best Practices for Azure Security in 2022

--

--

Ahmed Hassen

MLOps Engineer with a solid background in Python development/Big data features. Certified Azure (DP100) and Google (PDE on GCP)