Azure Synapse Analytics — Security capabilities

Eleonora Fontana
Betacom
Published in
5 min readMay 3, 2021
Photo by Franck on Unsplash

Introduction

Developing an enterprise data warehouse solution requires careful security checks to prevent and address threats that may arise. Azure Synapse Analytics provides a comprehensive security model to protect data on multiple layers. In this article we will explore the security features offered by Azure Synapse.

Security overview

Azure Synapse Analytics security system is build on four levels of security checks, shown in the picture below.

Source: Security Overview — Azure SQL Database & Azure SQL Managed Instance

Let’s see in detail what each level is responsible for.

The Network Security level verifies that only valid addresses access the Data Warehouse, using SQL firewall and Virtual Network rules. The first ones grant access based on the source IP address of each request, whereas the second ones allow the database to only accept communications sent from selected subnets within a virtual network.

The Access Management level manages the authentication process, i.e. the process of proving the user is who they claim to be. Azure Synapse Analytics support two types of authentication:

  • SQL authentication, i.e. connection using username and password;
  • Azure Active Directory authentication, using identities in Azure Active Directory. Such an approach allows administrators to centrally manage identities and permissions of all database users. This brings several advantages, such as the minimization of password storage and centralized password rotation policies.

The term Authorization refers to the permissions assigned to a user within a database and determines what the user is (not) allowed to do. Permissions are controlled by adding user accounts to database roles. Administrators can then assign database-level permissions to each role and grant each user specific object-level permissions. Azure Synapse Analytics also supports row-level security, which enables customers to control access to rows in a database table based on the characteristics of the user executing a query.

Lastly, the Threat Protection level looks for suspicious patterns. When enabled, the Advanced Threat Protection analyzes logs to detect unusual behavior and potentially harmful attempts to access and/or exploit databases. When suspicious activities are detected, alerts are created and all the details can be viewed from the Azure Security Center, where you can also find recommendations for further investigation and actions to mitigate the threat. Advanced Threat Protection can be enabled per server for an additional fee.

You can find all the Microsoft recommendations and best practices at this link.

Data protection and encryption

Recall that in Azure Synapse Analytics data can be at rest or in motion. Indeed, as we discussed in a previous article (Dedicated SQL pools in Azure Synapse Analytics | Betacom), the Data Movement Service moves data between nodes to allow the execution of queries in parallel. Depending on the current state, data is encrypted by Transport Layer Security (for data in motion) or Transparent Data Encryption (for data at rest).

Transport Layer Security (TLS) allows Azure Synapse Analytics to secure customer data by encrypting them while in motion. This ensures all data is encrypted “in transit” between the client and the server.

As a best practice, in the connection string used by the application you should specify an encrypted connection and not trust the server certificate. This forces your application to verify the server certificate and thus prevents the app from being vulnerable to man in the middle type attacks.

Note that some non-Microsoft drivers may not use TLS by default or rely on an older version of TLS (<1.2) in order to function. In this case the server still allows you to connect to your database. However, Microsoft recommends to evaluate the security risks of allowing such drivers and applications to connect to SQL Database, especially if you store sensitive data.

Transparent Data Encryption (TDE) can be manually enabled to help protect Azure Synapse Analytics against the threat of malicious offline activity by encrypting data at rest. It performs real-time encryption and decryption of the database, associated backups, and transaction log files at rest without requiring changes to the application.

TDE encrypts the storage of an entire database by using a symmetric key called the Database Encryption Key (DEK). On database startup, the encrypted DEK is decrypted and then used for decryption and re-encryption of the database files in the SQL Server database engine process. DEK is protected by the TDE protector, which is either a service-managed certificate (service-managed TDE) or an asymmetric key stored in Azure Key Vault (customer-managed TDE). The TDE protector is set at server level and is inherited by all databases associated with that server.

Another way Azure Synapse Analytic uses to protect your data is the Dynamic Data Masking (DMM). It limits sensitive data exposure by masking it to non-privileged users. DMM automatically discovers potentially sensitive data and provides actionable recommendations to mask these fields, with minimal impact to the application layer. It works by obfuscating the sensitive data in the result set of a query over designated database fields, while the data in the database is not changed. Note that it acts at database level, thus it has no impact on the application.

Source: Security Overview — Azure SQL Database & Azure SQL Managed Instance

Masking functions are a set of methods that control the exposure of data for different scenarios. Different kinds of masking are listed below.

  • The default masking is a full masking according to the data types of the designated fields.
  • Credit card masking exposes the last four digits of the column, e.g. 0000–0000–0000–1234 becomes XXXX-XXXX-XXXX-1234.
  • The email masking shows the first character and replaces the domain with XXX.com, e.g. fake-email@betacom.it becomes fXXXXXXXXX@XXXX.com.
  • Random number masking replaces everything with random values.
  • Custom text masking exposes the first and last characters and adds a custom padding string in the middle.

Conclusions

In this article we discussed how Azure Synapse Analytics acts to protect your data. Please refer to the official Microsoft documentation for further details.

Please subscribe to the Betacom publication 👏 and leave a comment if you have any questions.

--

--