Securing Your Snowflake Instance: Best Practices and Advanced Techniques
This document aims to provide stakeholders with a detailed overview of some of the most important security measures to implement to protect a Snowflake instance. Through the description of each component, from network policies to advanced authentication techniques, it seeks to offer a comprehensive understanding of how to currently address security challenges in managing data stored in Snowflake.
List of security components to implement:
- Network Policies
- Private Link
- SAML2
- SCIM
- OAuth
- MFA
- Security, Storage, Notification, Api and External Access Integration
- Security Access Control
- Tagging
- Data Masking
Network Policies
The Snowflake Network Policies service enables restricting access to our instance based on specified IP addresses or IP address ranges. Essentially, it allows creating a list of permitted IPs as well as a list of blocked IPs, if desired. As a SaaS service, the web front end of a Snowflake instance is publicly accessible on the internet (if the URL is known).
If attempting to log in to a Snowflake instance from an IP address not within the permitted IP ranges, Snowflake displays the following access denial message
To implement Network Policies, you have all the details in the following documentation.
Private Link
The Snowflake Private Link is a feature that provides a secure and private way to access your Snowflake account from a virtual private network (VPC) in a public cloud environment.
When you configure Snowflake Private Link, a private endpoint is created within your VPC that acts as a secure bridge between your network and Snowflake. This allows you to connect to Snowflake using private IP addresses instead of public IP addresses, enhancing security and privacy by avoiding traffic over the public internet.
Snowflake Private Link is especially useful for businesses that require a high level of security and control over access to their data in Snowflake. By using a private connection, exposure to security threats is reduced, and data confidentiality is improved, helping to meet compliance and data protection requirements.
To implement Private Link, you have all the details in the following documentation.
SAML2
SAML2 in Snowflake refers to using the Security Assertion Markup Language (SAML) version 2.0 protocol for authentication and authorization of users in the cloud-based database management platform, Snowflake.
With this integration, users can securely log in using security assertions transferred between Snowflake and an identity provider (such as Azure Entra ID). This enables Single Sign-On (SSO), allowing users to access Snowflake after authenticating with the identity provider without the need to provide additional credentials. The setup involves establishing mutual trust between Snowflake and, for example, Azure Entra ID.
To implement SAML2, you have all the details in the following documentation.
SCIM
The integration of Snowflake with Azure Entra ID (Azure AD) using SCIM simplifies user and group management, as well as attribute management between both systems. SCIM is a standard that enables automatic synchronization of identity information, such as users and groups, improving efficiency by eliminating the need to manually manage this data on both platforms. With this integration, changes made in Azure Entra ID are automatically reflected in Snowflake and vice versa, ensuring consistency, efficiency, and centralization in identity management.
- Azure Entra ID members sync to Snowflake as users.
- Azure Entra ID groups are synced to Snowflake as roles.
When configuring SCIM, there’s an intermediate application between Snowflake and Azure Entra ID called the Enterprise Application. This intermediate application is the one that has permissions to provision members and groups from Azure AD as users and roles in Snowflake.
To implement SCIM, you have all the details in the following documentation.
Also, this automatic provisioning tutorial may help you to implement it.
OAuth
Snowflake enables OAuth for clients through integrations. An integration is a Snowflake object that provides an interface between Snowflake and third-party services. Administrators configure OAuth through a security integration, which allows clients supporting OAuth to redirect users to an authorization page and generate access tokens (and optionally, refresh tokens) to access Snowflake.
The two most commonly used authentication flows from Azure for programmatic logins are the Password Grant Flow and the Client Credentials Grant Flow.
The application sends the user’s credentials (username and password), client ID, and client secret directly to the authorization server to obtain an access token for Snowflake. (User permissions are delegated to an Azure App Registration). The permissions granted are always those that the user has in their role matrix.
The application authenticates directly with Azure AD using its own client ID and client secret to obtain an access token for Snowflake. (The application does not require delegation of permissions by a user)
Additionally, Snowflake’s external OAuth can be configured with services such as Microsoft Power BI, Tableau and Microsoft Logic Apps among others.
MFA
Snowflake offers multi-factor authentication (MFA) to enhance login security for users accessing Snowflake. This feature is seamlessly integrated within Snowflake and is powered by the Duo Security service, which Snowflake manages entirely.
Users don’t need to separately register with Duo or undertake any additional tasks apart from installing the Duo Mobile app, compatible with various smartphone platforms such as iOS, Android, and Windows.
MFA is activated on a per-user basis; however, users aren’t automatically enrolled in MFA. They need to enroll themselves to utilize MFA.
At a minimum, Snowflake strongly recommends that all users with the ACCOUNTADMIN role be required to use MFA.
To implement MFA, you have all the details in the following documentation.
Also, See the Duo User Guide for more information about implement Duo.
Security, Storage, Notification, Api and External Access Integrations
Security integrations enable Snowflake to connect with external identity providers such as Active Directory, Google Identity Platform, and Okta.
These integrations provide an additional layer of security by allowing Snowflake to authenticate users using their credentials from the external organization, rather than storing them in Snowflake.
All the components seen in previous points are made possible thanks to Security Integrations.
Security storage integrations allow Snowflake to connect with external storage systems such as Amazon S3, Azure Blob Storage, and Google Cloud Storage, creating a trusted link between Snowflake and the cloud storage.
The trust link, in the case of Azure for example, is established through the Azure Data Entry registration app.
Snowflake Security Notifications are a type of notification integration that allows Snowflake to send notifications to external notification queues, such as Azure Event Grid, or consume incoming notifications to process information unattended (auto ingestion in near real time).
The trust link is established through the Azure Data Entry registration app.
API Integration focuses on enabling the integration of Snowflake with external services through APIs (Application Programming Interfaces). This allows Snowflake to interact with other systems, such as third-party applications, data analytics tools, cloud services, etc., using the APIs provided by those systems.
The primary focus is to facilitate communication and integration between Snowflake and other systems by leveraging APIs.
External Access Integration allows accessing and utilizing external resources from within the Snowflake environment. This may include cloud services, external storage, external authentication systems, among others.
This can be useful for operations such as loading and unloading data to and from external services, authenticating users using external authentication systems, or executing queries directly on data stored in external systems.
Security Access Control
Access control in Snowflake is based on the DAC and RBAC models.
- Discretionary Access Control (DAC)
Each object has an owner, who can grant access to that object. This means that when a database object is created, the owner and the role with the authority to grant access is the role with which the person created that object. As a result, top-level objects such as databases and schemas are created solely by the ACCOUNTADMIN role, which grants entry access to the objects contained within them.
- Role-Based Access Control (RBAC)
Role-Based Access Control (RBAC) is an access control model that revolves around assigning access permissions to roles rather than individual users. In RBAC, users are assigned to specific roles based on their function within the organization or system. Subsequently, access permissions are assigned to these roles, determining which actions or resources can be accessed by users occupying those roles.
The fundamental concept behind RBAC is the idea of abstraction and simplification in permission management. Instead of having to assign permissions individually to each user, permissions are grouped into roles that represent common sets of responsibilities within the organization. This facilitates permission management as users transition between roles or join different teams, as they only need to be assigned to the corresponding roles.
In this Medium document, I explain and propose a standardization and normalization of RBAC (Role-Based Access Control) through roles, permissions, and hierarchy.
Tagging
Tags enable data administrators to monitor sensitive data for compliance, discovery, protection, and resource usage through a centralized or decentralized data governance management approach. Essentially, we can add metadata to columns, tables, and database views to classify information or add an additional layer of governance-level information.
It is possible to add new tags, for example, to identify the sensitivity of the data being stored, such as a tag SENSITIVITY_DATA with a value of protected.
To implement Tagging, you have all the details in the following documentation.
In this Medium document, I explain and propose a standardization and normalization of Tagging and how implement it.
Data Masking
In Snowflake, it is possible to mask data not at the storage level, but at the runtime when querying the information. Thanks to the tagging mentioned in the previous point, it is possible to indicate that certain columns with values, for example, like the tag SENSITIVITY_DATA with a value of protected, should not be visible to anyone when querying the information from a specific table or view.
It is also possible to mask the information based on other types of parameters besides tags, for example, depending on the role of the person querying the information. A very visual example:
CREATE OR REPLACE MASKING POLICY email_mask AS (val string) RETURNS string
->
CASE
WHEN CURRENT_ROLE() IN ('ANALYST') THEN val
ELSE '*********'
END;
To implement Data Masking, you have all the details in the following documentation.
In Conclusion
Our journey into the realm of Snowflake Guard has revealed a robust fortress of security, equipped with industry-leading features and capabilities. From intricate network policies to advanced authentication methods like SAML2, SCIM, and OAuth, every aspect of data protection is meticulously addressed.
The incorporation of Multi-Factor Authentication (MFA) and Security Access Control, including DAC & RBAC, fortifies access control and ensures only authorized personnel can access sensitive data. Furthermore, features like tagging and data masking enhance data governance and privacy, aligning with regulatory standards and best practices.