Practical Implementation of federated architecture with Snowflake features
I am a Senior Solutions Architect at Snowflake with 20 years of strategy, architecture, and engineering experience. The views expressed here are mine alone and do not necessarily reflect the views of my current, former, or future employers. The blog utilises a JINJA2 template to demonstrate provisioning of environments and mentions GIT integration, which at the time of writing is still a preview feature of Snowflake Platform.
Now that the legalities are out of the way, I aim to provide a fresh perspective on practical implementation patterns and offer insights into effectively leveraging Snowflake features for a successful rollout. This is the first of a series of posts that explores how to utilise various Snowflake features to tune various architectural components to find the right “level” of federation that your organisation can handle while allowing for future growth. I hope you are excited!
Federated architecture is a major leap forward in addressing the challenges of managing distributed data systems in our interconnected world. As businesses handle the complexities of global operations and rapid data growth, federated architecture offers the flexibility needed to maximise data potential, improve decision-making, and enhance operational agility in a data-driven environment.
Successfully implementing a federated architecture can benefit a large organisation in several ways:
- Accelerating development by enabling agile teams
- Avoiding silos and encouraging product thinking
- Enhancing data sharing and collaboration
- Implementing scalable solutions
- Supporting distributed governance
Federation is acknowledged as primarily an organisational approach that defines responsibilities and coordination across separate domain teams and their central services team. It requires enabling multiple domains with varying expertise levels, ensuring that governance policies can be easily applied and maintained, and promoting product thinking within the platform to prevent silos and ensure interoperability.
However, the right technology can help accelerate adoption of federation. The goal is to utilise technology to abstract the operational complexity by providing a repeatable and scalable framework with built-in governance, improve parity among domains of differing technical capabilities, allowing domain teams to focus on their expertise: building data products.
The Snowflake platform offers several features to simplify and accelerate the implementation of a federated architecture.
Architectural Levers
Achieving the right balance between creating domain-autonomous and scalable architecture and maintaining high quality integrated data with necessary governance is essential for the success of any federation journey. This balance should align with the organisation’s capabilities, i.e., how widely security/ingestion/integration/modelling/quality/performance and other architectural design considerations is understood in the organisation.
Each team’s responsibilities should be based on their current functional capabilities to foster trust and not force a change that the organisation is not ready for. One the other hand it should facilitate delegation to those parts of organisation that has the capability to support the responsibilities. The technical implementation must be flexible enough to accommodate varying capabilities among teams within the organisation and allow for a modular setup.
To influence the degree of federation, identify the optimal segregation of duties (SoD) based on what works best for the organisation and each domain, and use architectural “levers” to support the technical implementation and future growth. The goal is to set these “levers” correctly to achieve the desired level of federation. The chosen level of federation will depend on the organisation’s maturity in each of these areas.
The Architectural levers are:
- Deployment Models — A deployment model or provisioning model supports the current team structure and their responsibilities. This involves deciding how teams are organised and share responsibilities and at what level data context needs to be maintained for consumption purposes. For example, one team might onboard data sources, another might handle integration, and a third might create data products. Alternatively, a single team could handle all data integration while individual domain teams create their own data products. Additionally, two domains might need to integrate a lot of their data to create useful data products but might not need data from other domains. Even though this is the key first step in deciding your architecture, this does not limit your choices for the other levers.
- Monitoring & Quality — can either be globally controlled, use a single global framework with local application, or be locally controlled. In many companies where there is legacy structure still in place, central support teams monitor data pipelines and ensure data quality. In addition there is usually an IT or infrastructure team that is responsible for allocating budget on usage of cloud or SaaS tools and there is a need to setup cost-attribution in an effective way for these teams to track. In a cloud world where cost is driven by consumption, the IT/infra/platform team has the responsibility to ensure that the cost is visible and attributed accurately. Similarly, data quality tools are often owned by the central team. More mature organisations might have distributed governance teams set standards and frameworks, delegating the monitoring and quality assurance to individual teams.
- Sharing architecture — Sharing controlled centrally(a hub model) Vs Common framework for Sharing with local application Vs Local Model. In cases where data ownership and stewardship is yet to be established and again where there is a central ownership legacy, the data from individual domains is shared to a central hub, which then distributes the data across the rest of the organisation. A more mature organisation might choose a common framework to be created (Example: tag all objects for an automation process to pick up and share to the target) but the responsibility of what is shared and to whom falls on the data owners. On the other end of the spectrum is there is no governance defined and all ownership, automation, process and responsibility is handled at the domain level who have the most knowledge about their data.
- Governance Policies — Global policies Vs Global standardisation with local application Vs Local policies. In most companies, Governance remains firmly a central capability either because of lack of knowledge on regulations, organisation unwilling to delegate responsibility due to the serious implications it could cause and lack of trust in the ability of the domain teams when it comes to security of especially PII data. Governance policies can be applied centrally in addition to building in the capability to apply/create additional governance policies locally on data for which central policies are not defined.
- Discovery — Centrally controlled vs Frameworks. Several customers try to create a single catalog for all data in the organisation. This largely depends on how quickly the global, distributed teams are provided with enough knowledge on the tool to ensure easy operation, how well it integrates with systems that hold the data and how much the data catalog tool allows for distributed operations.
Lever 1: Deployment Models
The deployment model is generally determined based on how responsibilities are divided between the centrally managed service/team and distributed domain teams. In most cases, when an organisation is just starting on the federation journey, based on the historical capabilities of the teams, certain services might still be run centrally. For example, while most organisations easily establish consumer-aligned domain teams, they often struggle with aligning data sources ownership to the domains. It is common practice to establish a “central” ingestion team responsible for ensuring data sources are onboarded to the platform and stored in a central area for domain teams to provision on an as-needed basis.
Furthermore, there is a need to contextually map shared assets between domains, which often gives rise to a coarse domain. A coarse domain consists of multiple domains aligned either by a value stream or other criteria.
From an implementation perspective four major types of implementations as shown in the above picture, depending on
- Data pipeline layer responsibility based on capabilities(ingest, integrate or curate), governance requirements, etc.
- Separation of domains: Fine-grained, coarse-grained, etc.
- Team organisation supporting domains: Central teams, shared service teams, domain-level teams
- Data product dependencies: Dependent on a single domain context, multiple domain contexts requiring integration across multiple domains.
In some organisations, more than one of these models may need to exist in parallel based on the requirements and maturity of individual domains. For example, some domains can run their own end-to-end pipeline, while others depend on the expertise of a shared services team. These shared services teams can be a single team across a data layer or one team across multiple domains.
The below table provides key differences for each deployment model type and where to choose which model,
Type 1 — Now consider that you want to create a PROD environment of type 1 deployment model. There is one team that is responsible for onboarding sources, integrating and curating the data. There is also a domain team(Logistics in this example) that wants to create two data products, Fleet Management and RouteOptimisation. The RouteOptimisation data product will be created and owned by a sub-team within the domain and needs to be isolated from the rest of the domain. The source data that will be needed are delivery scheduled, cost and delivery timings details from the SAP source system and traffic and weather information from the marketplace.
Type 2 — Consider the same scenario as Type 1 above, however assume that in this organisation, the data is integrated for Supply Chain domain instead of at the organisational level by the supply chain teams and logistics team, as a sub-domain creates data products relating to logistics based on the data integrated by Supply chain teams.
Type 3 — Lets switch up the industry a little for this. Consider an organisation with a manufacturing division. The manufacturing domain wants to create a workflow optimisation data product taking OEE source data as well as bottleneck data from their source system. However, the manufacturing team lacks experience in integrating with source systems. Here the central team have a team of experts in integrating and ingesting SAP systems in addition to a team of engineers well versed in data ingestion from all sources within their organisation who will be responsible for ingesting the raw source data and making it available to the manufacturing and other domains in their org. The manufacturing team, being experts on the business logic of their data, integrate the data from different source systems in a meaningful curated layer and then create the workflowoptimisation data product.
Type 4 — Consider a similar organisation as described for Type 3. Now the manufacturing domain wants to create a predictive maintenance data product which needs IOT data. Since this is a specialised data source that is more extensively used in the Manufacturing domain than rest of the org, knowledge on working with the IOT data and ingesting it optimally exists solely within the Manufacturing domain. Also this data is not globally relevant for other domains in the raw format. So it makes sense for the manufacturing domain to ingest, integrate and transform the sensor data in a meaningful curated layer themselves, before going on to build a predictive maintenance data product
One starter template to rule all the deployment models
With the introduction of JINJA support for Snowflake SQL(a preview feature at the time of writing), a single JINJA template can accommodate all the aforementioned models and their variations. Jinja, a Python package, is widely used for creating templates, making it a powerful tool for generating SQL files. Snowflake can render and execute Jinja2 templates with the EXECUTE IMMEDIATE FROM
command. A template can include variables and expressions, enabling the use of loops, conditionals, variable substitution, macros, and more.
Since Snowflake already allows SQL to create all required objects, including warehouses, databases, schemas, roles, etc., the template’s primary function is to identify the various objects that need to be created for each model type. The script will utilise object access roles as outlined in the Snowflake documentation.
The script will make use of object access roles and functional roles as mentioned in the snowflake documentation
The below table describes the common objects that will be created in all deployment models,
All the common objects to be created can be templated in JINJA2.
As a first step, consider the administrative setup described above and create the ADMIN_DB, schema for tags, policies and operations. Additionally the delegated admin roles per environment are also created with appropriate privileges.
{% set environment = environments.split(';') %}
{% for item in environment %}
USE ROLE ACCOUNTADMIN;
CREATE DATABASE IF NOT EXISTS ADMIN_DB;
USE DATABASE ADMIN_DB;
CREATE SCHEMA IF NOT EXISTS ADMIN_OPS;
CREATE SCHEMA IF NOT EXISTS ADMIN_TAGS;
USE SCHEMA ADMIN_TAGS;
CREATE TAG IF NOT EXISTS ENVIRONMENT ALLOWED_VALUES 'DEV','QA','PROD1','PROD' COMMENT = 'Environments tag';
CREATE WAREHOUSE IF NOT EXISTS ADMIN_WH COMMENT = 'VIRTUAL WAREHOUSE TO PERFORM ADMINISTRATION' WAREHOUSE_SIZE = 'X-Small' AUTO_RESUME = true AUTO_SUSPEND = 300 ENABLE_QUERY_ACCELERATION = false WAREHOUSE_TYPE = 'STANDARD' MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 1 SCALING_POLICY = 'STANDARD';
CREATE SCHEMA IF NOT EXISTS ADMIN_POLICY;
USE ROLE USERADMIN;
CREATE ROLE IF NOT EXISTS {{ item }}_SYSADMIN;
USE ROLE ACCOUNTADMIN;
GRANT ROLE {{ item }}_SYSADMIN TO ROLE SYSADMIN;
GRANT CREATE DATABASE, CREATE WAREHOUSE,CREATE INTEGRATION,CREATE SHARE,EXECUTE ALERT,EXECUTE DATA METRIC FUNCTION,EXECUTE MANAGED ALERT,EXECUTE MANAGED TASK,EXECUTE TASK,IMPORT SHARE,MONITOR EXECUTION, MONITOR USAGE,APPLY MASKING POLICY,APPLY ROW ACCESS POLICY,APPLY AGGREGATION POLICY,APPLY PROJECTION POLICY,APPLY TAG ON ACCOUNT TO ROLE {{ item }}_SYSADMIN ;
USE ROLE USERADMIN;
CREATE ROLE IF NOT EXISTS {{ item }}_USERADMIN;
GRANT ROLE {{ item }}_USERADMIN TO ROLE USERADMIN;
USE ROLE ACCOUNTADMIN;
GRANT CREATE ROLE ON ACCOUNT TO ROLE {{ item }}_USERADMIN;
The database level access read, write and control roles are created using the below macro. Creating and utilising macros for repeatable objects is considered best practice in JINJA. Also note that when a schema is created within this database, schema level read, write and control access over schema level objects are created and granted to these roles.
{% macro create_dbs(env, dbname) %}
USE ROLE {{ env }}_SYSADMIN;
CREATE DATABASE IF NOT EXISTS {{ env }}_{{ dbname }}_db;
USE DATABASE {{ env }}_{{ dbname }}_db;
ALTER DATABASE {{ env }}_{{ dbname }}_db SET TAG ADMIN_DB.ADMIN_TAGS.ENVIRONMENT = {{ ev }};
CREATE DATABASE ROLE IF NOT EXISTS {{ env }}_{{ dbname }}_db_r;
CREATE DATABASE ROLE IF NOT EXISTS {{ env }}_{{ dbname }}_db_w;
CREATE DATABASE ROLE IF NOT EXISTS {{ env }}_{{ dbname }}_db_c;
GRANT DATABASE ROLE {{ env }}_{{ dbname }}_db_r TO DATABASE ROLE {{ item }}_{{ dbname }}_db_w;
GRANT DATABASE ROLE {{ env }}_{{ dbname }}_db_w TO DATABASE ROLE {{ item }}_{{ dbname }}_db_c;
GRANT DATABASE ROLE {{ env }}_{{ dbname }}_db_c TO ROLE {{ item }}_SYSADMIN;
GRANT ALL ON DATABASE {{ env }}_{{ dbname }}_db TO DATABASE ROLE {{ item }}_{{ dbname }}_db_c;
GRANT MODIFY, MONITOR ON DATABASE {{ env }}_{{ dbname }}_db TO DATABASE ROLE {{ item }}_{{ dbname }}_db_w;
GRANT MONITOR ON DATABASE {{ env }}_{{ dbname }}_db TO DATABASE ROLE {{ item }}_{{ dbname }}_db_r;
{% endmacro %}
Now to create the schema level roles and to connect them to the corresponding role at the database level.
{% macro create_schema(env,dbname, schema) %}
USE ROLE {{ env }}_SYSADMIN;
USE DATABASE {{ env }}_{{ dbname }}_db;
CREATE SCHEMA IF NOT EXISTS {{ schema }}_sch WITH MANAGED ACCESS;
CREATE DATABASE ROLE IF NOT EXISTS {{ env }}_{{ dbname }}_{{ schema }}_r;
CREATE DATABASE ROLE IF NOT EXISTS {{ env }}_{{ dbname }}_{{ schema }}_w;
CREATE DATABASE ROLE IF NOT EXISTS {{ env }}_{{ dbname }}_{{ schema }}_c;
GRANT DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_r TO DATABASE ROLE {{ env }}_{{ dbname }}_db_r;
GRANT DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_w TO DATABASE ROLE {{ env }}_{{ dbname }}_db_w;
GRANT DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_c TO DATABASE ROLE {{ env }}_{{ dbname }}_db_c;
GRANT OWNERSHIP ON SCHEMA {{ env }}_{{ dbname }}_db.{{ schema }}_sch TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_c REVOKE CURRENT GRANTS;
GRANT DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_w TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_c;
GRANT DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_r TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_w;
GRANT MONITOR, USAGE, ADD SEARCH OPTIMIZATION ON SCHEMA {{ env }}_{{ dbname }}_db.{{ schema }}_sch TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_w;
GRANT ALL ON FUTURE TABLES IN SCHEMA {{ env }}_{{ dbname }}_db.{{ schema }}_sch TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_c;
GRANT ALL ON FUTURE VIEWS IN SCHEMA {{ env }}_{{ dbname }}_db.{{ schema }}_sch TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_c;
GRANT SELECT,INSERT,UPDATE,DELETE,TRUNCATE ON FUTURE TABLES IN SCHEMA {{ env }}_{{ dbname }}_db.{{ schema }}_sch TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_w;
GRANT SELECT, REFERENCES ON FUTURE VIEWS IN SCHEMA {{ env }}_{{ dbname }}_db.{{ schema }}_sch TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_w;
GRANT SELECT ON FUTURE VIEWS IN SCHEMA {{ env }}_{{ dbname }}_db.{{ schema }}_sch TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_r;
GRANT SELECT ON FUTURE TABLES IN SCHEMA {{ env }}_{{ dbname }}_db.{{ schema }}_sch TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_r;
{% endmacro %}
The next step is to create the functional roles and grant relevant database roles to these functional roles,
{% macro create_functional_roles(env, databaseroleprefix, functionroleprefix,dbname) %}
USE ROLE {{ env }}_USERADMIN;
CREATE ROLE IF NOT EXISTS {{ functionroleprefix }}_CICD;
GRANT ROLE {{ functionroleprefix }}_CICD TO ROLE {{ env }}_SYSADMIN;
CREATE ROLE IF NOT EXISTS {{ functionroleprefix }}_ETL;
GRANT ROLE {{ functionroleprefix }}_ETL TO ROLE {{ env }}_SYSADMIN;
CREATE ROLE IF NOT EXISTS {{ functionroleprefix }}_ANALYST;
GRANT ROLE {{ functionroleprefix }}_ANALYST TO ROLE {{ env }}_SYSADMIN;
USE ROLE {{ env }}_SYSADMIN;
USE DATABASE {{ env }}_{{ dbname }}_db;
GRANT DATABASE ROLE {{ env }}_{{ databaseroleprefix }}_c TO ROLE {{ functionroleprefix }}_CICD;
GRANT DATABASE ROLE {{ env }}_{{ databaseroleprefix }}_w TO ROLE {{ functionroleprefix }}_ETL;
GRANT DATABASE ROLE {{ env }}_{{ databaseroleprefix }}_r TO ROLE {{ functionroleprefix }}_ANALYST;
{% endmacro %}
When warehouses are created the corresponding warehouse roles are also created and granted to the functional roles created
{% macro create_warehouse(env, warehousenameprefix,functionroleprefix) %}
USE ROLE {{ env }}_USERADMIN;
CREATE ROLE IF NOT EXISTS {{ warehousenameprefix }}_wh_usg;
CREATE ROLE IF NOT EXISTS {{ warehousenameprefix }}_wh_opr;
CREATE ROLE IF NOT EXISTS {{ warehousenameprefix }}_wh_adm;
GRANT ROLE {{ warehousenameprefix }}_wh_usg TO ROLE {{ warehousenameprefix }}_wh_opr;
GRANT ROLE {{ warehousenameprefix }}_wh_opr TO ROLE {{ warehousenameprefix }}_wh_adm;
GRANT ROLE {{ warehousenameprefix }}_wh_adm TO ROLE {{ env }}_SYSADMIN;
USE ROLE {{ env }}_SYSADMIN;
CREATE WAREHOUSE IF NOT EXISTS {{ warehousenameprefix }}_wh WAREHOUSE_SIZE = 'X-Small' AUTO_RESUME = true AUTO_SUSPEND = 300 ENABLE_QUERY_ACCELERATION = false WAREHOUSE_TYPE = 'STANDARD' MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 1 SCALING_POLICY = 'STANDARD' INITIALLY_SUSPENDED = true;
ALTER WAREHOUSE {{ warehousenameprefix }}_wh SET TAG ADMIN_DB.ADMIN_TAGS.ENVIRONMENT = {{ ev }};
GRANT USAGE, MONITOR ON WAREHOUSE {{ warehousenameprefix }}_wh to role {{ warehousenameprefix }}_wh_usg;
GRANT OPERATE, MODIFY ON WAREHOUSE {{ warehousenameprefix }}_wh to role {{ warehousenameprefix }}_wh_opr;
GRANT ALL ON WAREHOUSE {{ warehousenameprefix }}_wh to role {{ warehousenameprefix }}_wh_adm;
USE ROLE {{ env }}_USERADMIN;
GRANT ROLE {{ warehousenameprefix }}_wh_usg TO ROLE {{ functionroleprefix }}_ANALYST;
GRANT ROLE {{ warehousenameprefix }}_wh_opr TO ROLE {{ functionroleprefix }}_ETL;
GRANT ROLE {{ warehousenameprefix }}_wh_adm TO ROLE {{ functionroleprefix }}_CICD;
{% endmacro %}
The type specific differences when it comes to the list of databases, schemas, roles and warehouses needed are described in the below table
For example for type 1, the Source, Integration and curation databases, with schema per source for each source type, one Integration and curation schemas, a database for each domain, a schema per data product within the domain database, warehouse per team, and optionally a warehouse/roles for a source or a data product is setup as below.
{% if model == 'type1' %}
{% set databases = "SOURCE;INTEGRATION;CURATED;"+domains %}
{% set intnames = "INTEGRATION" %}
{% set curnames = "CURATED" %}
{% set dbschgroups = "SOURCE^"+sourcenames+"*INTEGRATION^"+intnames+"*CURATED^"+curnames+"*"+domains+"^"+dpnames %}
{% set dbteams = "SOURCE^"+ item+"*INTEGRATION^"+ item +"*CURATED^"+ item %}
{% set dbteams =dbteams+"*"+ domains +"^"+item+"_"+domains %}
{% if createwarehousesforsources != '' %}
{% set schteams ="SOURCE^"+"SOURCE_"+createwarehousesforsources%}
{% endif %}
{% if createwarehousesfordp != '' %}
{% if schteams == '' %}
{% set schteams =domains +"^"+domains+"_"+createwarehousesfordp %}
{% elif schteams != '' %}
{% set schteams =schteams+"*"+ domains +"^"+domains+"_"+createwarehousesfordp %}
{% endif %}
{% endif %}
In order to execute the macros established earlier using the variable variations for type 1 given in the above script we use the code below
--!jinja2
{% set environment = environments.split(';') %}
{% for item in environment %}
{% macro create_dbs(env, dbname) %}
USE ROLE {{ env }}_SYSADMIN;
CREATE DATABASE IF NOT EXISTS {{ env }}_{{ dbname }}_db;
USE DATABASE {{ env }}_{{ dbname }}_db;
ALTER DATABASE {{ env }}_{{ dbname }}_db SET TAG ADMIN_DB.ADMIN_TAGS.ENVIRONMENT = {{ ev }};
CREATE DATABASE ROLE IF NOT EXISTS {{ env }}_{{ dbname }}_db_r;
CREATE DATABASE ROLE IF NOT EXISTS {{ env }}_{{ dbname }}_db_w;
CREATE DATABASE ROLE IF NOT EXISTS {{ env }}_{{ dbname }}_db_c;
GRANT DATABASE ROLE {{ env }}_{{ dbname }}_db_r TO DATABASE ROLE {{ item }}_{{ dbname }}_db_w;
GRANT DATABASE ROLE {{ env }}_{{ dbname }}_db_w TO DATABASE ROLE {{ item }}_{{ dbname }}_db_c;
GRANT DATABASE ROLE {{ env }}_{{ dbname }}_db_c TO ROLE {{ item }}_SYSADMIN;
GRANT ALL ON DATABASE {{ env }}_{{ dbname }}_db TO DATABASE ROLE {{ item }}_{{ dbname }}_db_c;
GRANT MODIFY, MONITOR ON DATABASE {{ env }}_{{ dbname }}_db TO DATABASE ROLE {{ item }}_{{ dbname }}_db_w;
GRANT MONITOR ON DATABASE {{ env }}_{{ dbname }}_db TO DATABASE ROLE {{ item }}_{{ dbname }}_db_r;
{% endmacro %}
{% macro grant_functional_roles_to_functional_roles(env, functionroleprefix1,tofunctionroleprefix) %}
USE ROLE {{ env }}_USERADMIN;
GRANT ROLE {{ functionroleprefix1 }}_CICD TO ROLE {{ tofunctionroleprefix }}_CICD;
GRANT ROLE {{ functionroleprefix1 }}_ETL TO ROLE {{ tofunctionroleprefix }}_ETL;
GRANT ROLE {{ functionroleprefix1 }}_ANALYST TO ROLE {{ tofunctionroleprefix }}_ANALYST;
{% endmacro %}
{% macro create_functional_roles(env, databaseroleprefix, functionroleprefix,dbname) %}
USE ROLE {{ env }}_USERADMIN;
CREATE ROLE IF NOT EXISTS {{ functionroleprefix }}_CICD;
GRANT ROLE {{ functionroleprefix }}_CICD TO ROLE {{ env }}_SYSADMIN;
CREATE ROLE IF NOT EXISTS {{ functionroleprefix }}_ETL;
GRANT ROLE {{ functionroleprefix }}_ETL TO ROLE {{ env }}_SYSADMIN;
CREATE ROLE IF NOT EXISTS {{ functionroleprefix }}_ANALYST;
GRANT ROLE {{ functionroleprefix }}_ANALYST TO ROLE {{ env }}_SYSADMIN;
USE ROLE {{ env }}_SYSADMIN;
USE DATABASE {{ env }}_{{ dbname }}_db;
GRANT DATABASE ROLE {{ env }}_{{ databaseroleprefix }}_c TO ROLE {{ functionroleprefix }}_CICD;
GRANT DATABASE ROLE {{ env }}_{{ databaseroleprefix }}_w TO ROLE {{ functionroleprefix }}_ETL;
GRANT DATABASE ROLE {{ env }}_{{ databaseroleprefix }}_r TO ROLE {{ functionroleprefix }}_ANALYST;
{% endmacro %}
{% macro create_warehouse(env, warehousenameprefix,functionroleprefix) %}
USE ROLE {{ env }}_USERADMIN;
CREATE ROLE IF NOT EXISTS {{ warehousenameprefix }}_wh_usg;
CREATE ROLE IF NOT EXISTS {{ warehousenameprefix }}_wh_opr;
CREATE ROLE IF NOT EXISTS {{ warehousenameprefix }}_wh_adm;
GRANT ROLE {{ warehousenameprefix }}_wh_usg TO ROLE {{ warehousenameprefix }}_wh_opr;
GRANT ROLE {{ warehousenameprefix }}_wh_opr TO ROLE {{ warehousenameprefix }}_wh_adm;
GRANT ROLE {{ warehousenameprefix }}_wh_adm TO ROLE {{ env }}_SYSADMIN;
USE ROLE {{ env }}_SYSADMIN;
CREATE WAREHOUSE IF NOT EXISTS {{ warehousenameprefix }}_wh WAREHOUSE_SIZE = 'X-Small' AUTO_RESUME = true AUTO_SUSPEND = 300 ENABLE_QUERY_ACCELERATION = false WAREHOUSE_TYPE = 'STANDARD' MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 1 SCALING_POLICY = 'STANDARD' INITIALLY_SUSPENDED = true;
ALTER WAREHOUSE {{ warehousenameprefix }}_wh SET TAG ADMIN_DB.ADMIN_TAGS.ENVIRONMENT = {{ ev }};
GRANT USAGE, MONITOR ON WAREHOUSE {{ warehousenameprefix }}_wh to role {{ warehousenameprefix }}_wh_usg;
GRANT OPERATE, MODIFY ON WAREHOUSE {{ warehousenameprefix }}_wh to role {{ warehousenameprefix }}_wh_opr;
GRANT ALL ON WAREHOUSE {{ warehousenameprefix }}_wh to role {{ warehousenameprefix }}_wh_adm;
USE ROLE {{ env }}_USERADMIN;
GRANT ROLE {{ warehousenameprefix }}_wh_usg TO ROLE {{ functionroleprefix }}_ANALYST;
GRANT ROLE {{ warehousenameprefix }}_wh_opr TO ROLE {{ functionroleprefix }}_ETL;
GRANT ROLE {{ warehousenameprefix }}_wh_adm TO ROLE {{ functionroleprefix }}_CICD;
{% endmacro %}
{% macro create_schema(env,dbname, schema) %}
USE ROLE {{ env }}_SYSADMIN;
USE DATABASE {{ env }}_{{ dbname }}_db;
CREATE SCHEMA IF NOT EXISTS {{ schema }}_sch WITH MANAGED ACCESS;
CREATE DATABASE ROLE IF NOT EXISTS {{ env }}_{{ dbname }}_{{ schema }}_r;
CREATE DATABASE ROLE IF NOT EXISTS {{ env }}_{{ dbname }}_{{ schema }}_w;
CREATE DATABASE ROLE IF NOT EXISTS {{ env }}_{{ dbname }}_{{ schema }}_c;
GRANT DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_r TO DATABASE ROLE {{ env }}_{{ dbname }}_db_r;
GRANT DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_w TO DATABASE ROLE {{ env }}_{{ dbname }}_db_w;
GRANT DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_c TO DATABASE ROLE {{ env }}_{{ dbname }}_db_c;
GRANT OWNERSHIP ON SCHEMA {{ env }}_{{ dbname }}_db.{{ schema }}_sch TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_c REVOKE CURRENT GRANTS;
GRANT DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_w TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_c;
GRANT DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_r TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_w;
GRANT MONITOR, USAGE, ADD SEARCH OPTIMIZATION ON SCHEMA {{ env }}_{{ dbname }}_db.{{ schema }}_sch TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_w;
GRANT ALL ON FUTURE TABLES IN SCHEMA {{ env }}_{{ dbname }}_db.{{ schema }}_sch TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_c;
GRANT ALL ON FUTURE VIEWS IN SCHEMA {{ env }}_{{ dbname }}_db.{{ schema }}_sch TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_c;
GRANT SELECT,INSERT,UPDATE,DELETE,TRUNCATE ON FUTURE TABLES IN SCHEMA {{ env }}_{{ dbname }}_db.{{ schema }}_sch TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_w;
GRANT SELECT, REFERENCES ON FUTURE VIEWS IN SCHEMA {{ env }}_{{ dbname }}_db.{{ schema }}_sch TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_w;
GRANT SELECT ON FUTURE VIEWS IN SCHEMA {{ env }}_{{ dbname }}_db.{{ schema }}_sch TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_r;
GRANT SELECT ON FUTURE TABLES IN SCHEMA {{ env }}_{{ dbname }}_db.{{ schema }}_sch TO DATABASE ROLE {{ env }}_{{ dbname }}_{{ schema }}_r;
{% endmacro %}
{% set ev ="'"+ item +"'" %}
USE ROLE ACCOUNTADMIN;
CREATE DATABASE IF NOT EXISTS ADMIN_DB;
USE DATABASE ADMIN_DB;
CREATE SCHEMA IF NOT EXISTS ADMIN_OPS;
CREATE SCHEMA IF NOT EXISTS ADMIN_TAGS;
USE SCHEMA ADMIN_TAGS;
CREATE TAG IF NOT EXISTS ENVIRONMENT ALLOWED_VALUES 'DEV','QA','PROD1','PROD' COMMENT = 'Environments tag';
CREATE SCHEMA IF NOT EXISTS ADMIN_POLICY;
USE ROLE USERADMIN;
CREATE ROLE IF NOT EXISTS {{ item }}_SYSADMIN;
USE ROLE ACCOUNTADMIN;
GRANT ROLE {{ item }}_SYSADMIN TO ROLE SYSADMIN;
GRANT CREATE DATABASE, CREATE WAREHOUSE,CREATE INTEGRATION,CREATE SHARE,EXECUTE ALERT,EXECUTE DATA METRIC FUNCTION,EXECUTE MANAGED ALERT,EXECUTE MANAGED TASK,EXECUTE TASK,IMPORT SHARE,MONITOR EXECUTION, MONITOR USAGE,APPLY MASKING POLICY,APPLY ROW ACCESS POLICY,APPLY AGGREGATION POLICY,APPLY PROJECTION POLICY,APPLY TAG ON ACCOUNT TO ROLE {{ item }}_SYSADMIN ;
USE ROLE {{ item }}_SYSADMIN;
CREATE WAREHOUSE IF NOT EXISTS {{ item }}_ADMIN_WH COMMENT = 'VIRTUAL WAREHOUSE TO PERFORM ADMINISTRATION' WAREHOUSE_SIZE = 'X-Small' AUTO_RESUME = true AUTO_SUSPEND = 300 ENABLE_QUERY_ACCELERATION = false WAREHOUSE_TYPE = 'STANDARD' MIN_CLUSTER_COUNT = 1 MAX_CLUSTER_COUNT = 1 SCALING_POLICY = 'STANDARD' initially_suspended=true;
USE ROLE USERADMIN;
CREATE ROLE IF NOT EXISTS {{ item }}_USERADMIN;
GRANT ROLE {{ item }}_USERADMIN TO ROLE USERADMIN;
USE ROLE ACCOUNTADMIN;
GRANT CREATE ROLE ON ACCOUNT TO ROLE {{ item }}_USERADMIN;
{% set whsch = createwarehousesforsources+";"+createwarehousesfordp %}
{% set schteams = "" %}
{% if model == 'type1' %}
{% set databases = "SOURCE;INTEGRATION;CURATED;"+domains %}
{% set intnames = "INTEGRATION" %}
{% set curnames = "CURATED" %}
{% set dbschgroups = "SOURCE^"+sourcenames+"*INTEGRATION^"+intnames+"*CURATED^"+curnames+"*"+domains+"^"+dpnames %}
{% set dbteams = "SOURCE^"+ item+"*INTEGRATION^"+ item +"*CURATED^"+ item %}
{% set dbteams =dbteams+"*"+ domains +"^"+item+"_"+domains %}
{% if createwarehousesforsources != '' %}
{% set schteams ="SOURCE^"+"SOURCE_"+createwarehousesforsources%}
{% endif %}
{% if createwarehousesfordp != '' %}
{% if schteams == '' %}
{% set schteams =domains +"^"+domains+"_"+createwarehousesfordp %}
{% elif schteams != '' %}
{% set schteams =schteams+"*"+ domains +"^"+domains+"_"+createwarehousesfordp %}
{% endif %}
{% endif %}
{% elif model == 'type2' %}
{% set coarsegraindom = coarsegraindom %}
{% set intnames = "INTEGRATION;CURATED" %}
{% set databases = "SOURCE;"+coarsegraindom+";"+domains %}
{% set dbschgroups = "SOURCE^"+sourcenames+"*"+coarsegraindom+"^"+intnames+"*"+domains+"^"+dpnames %}
{% set dbteams = "SOURCE^"+ item+"*"+coarsegraindom+"^"+ item %}
{% set dbteams =dbteams+"*"+ domains +"^"+item+"_"+domains %}
{% if createwarehousesforsources != '' %}
{% set schteams ="SOURCE^"+"SOURCE_"+ createwarehousesforsources%}
{% endif %}
{% if createwarehousesfordp != '' %}
{% if schteams == '' %}
{% set schteams =domains +"^"+domains+"_"+createwarehousesfordp %}
{% elif schteams != '' %}
{% set schteams =schteams+"*"+ domains +"^"+domains+"_"+createwarehousesfordp %}
{% endif %}
{% endif %}
{% elif model == 'type3' %}
{% set domains = domains %}
{% set intnames = "STAGE;INTEGRATION;CURATED;"+dpnames %}
{% set databases = "SOURCE;"+domains %}
{% set dbschgroups = "SOURCE^"+sourcenames+"*"+domains+"^"+intnames %}
{% set dbteams = "SOURCE^"+item %}
{% set dbteams =dbteams+"*"+ domains +"^"+item+"_"+domains %}
{% if createwarehousesforsources != '' %}
{% set schteams ="SOURCE^"+"SOURCE_"+ createwarehousesforsources%}
{% endif %}
{% if createwarehousesfordp != '' %}
{% if schteams == '' %}
{% set schteams =domains +"^"+domains+"_"+createwarehousesfordp %}
{% elif schteams != '' %}
{% set schteams =schteams+"*"+ domains +"^"+domains+"_"+createwarehousesfordp %}
{% endif %}
{% endif %}
{% elif model == 'type4' %}
{% set intnames = "INTEGRATION;CURATED;"+sourcenames+";"+dpnames %}
{% set databases = domains %}
{% set dbschgroups = domains+"^"+intnames %}
{% set dbteams = domains+"^"+item+"_"+domains %}
{% if createwarehousesforsources != '' %}
{% set schteams =domains+"^"+domains+"_"+ createwarehousesforsources%}
{% endif %}
{% if createwarehousesfordp != '' %}
{% if schteams == '' %}
{% set schteams =domains +"^"+domains+"_"+createwarehousesfordp %}
{% elif schteams != '' %}
{% set schteams =schteams+"*"+ domains +"^"+domains+"_"+createwarehousesfordp %}
{% endif %}
{% endif %}
{% endif %}
{% set databases = databases.split(';') %}
{% for database in databases %}
{% set ev ="'"+ item +"'" %}
{{ create_dbs(item,database) }}
{%endfor %}
{% set dbteams = dbteams.split('*') %}
{% for dbteam in dbteams %}
{% set db, team = dbteam.split('^') %}
{% set ev ="'"+ item +"'" %}
{{ create_functional_roles(item,db+"_db", team,db) }}
{{ create_warehouse(item, team,team) }}
{% endfor %}
{% set dbschgroups = dbschgroups.split('*') %}
{% for group in dbschgroups %}
{% set dbs, schemas = group.split('^') %}
{% set schemas = schemas.split(';') %}
{% for schema in schemas %}
{{ create_schema(item, dbs,schema) }}
{% endfor %}
{% endfor %}
{% if schteams != '' %}
{% set schtms = schteams.split('*') %}
{% for schteam in schtms %}
{% set db, team = schteam.split('^') %}
{% set ev ="'"+ item +"'" %}
{{ create_functional_roles(item,team, item+"_"+team,db) }}
{{ create_warehouse(item, item+"_"+team,item+"_"+team) }}
{% endfor %}
{% endif %}
{%endfor %}
The JINJA template is then stored as a file (template.sql) and loaded into a stage of the snowflake account.
Type 1 — Now consider that you want to create a PROD environment of type 1 deployment model. There is one team that is responsible for onboarding sources, integrating and curating the data. There is also a domain team(Logistics in this example) that wants to create two data products, Fleet Management and RouteOptimisation. The RouteOptimisation data product will be created and owned by a sub-team within the domain and needs to be isolated from the rest of the domain. The source data that will be needed are delivery scheduled, cost and delivery timings details from the SAP source system and traffic and weather information from the marketplace.
In order to create this using the JINJA template you have in your stage you can use a simple(self-explanatory) execution statement as below.
use role accountadmin;
EXECUTE IMMEDIATE FROM '@"STAGE"/template.sql'
USING (model=> 'type1', environments => 'PROD', sourcenames => 'SAP;TRAFFICWEATHER',domains=> 'LOGISTICS', dpnames => 'FLEETMANAGEMENT;ROUTEOPTIMISATION', createwarehousesforsources =>'',createwarehousesfordp=>'ROUTEOPTIMISATION' );
The roles structure created by the script is shown below,
Type 2 — Consider the same scenario as Type 1 above, however assume that in this organisation, the data is integrated for Supply Chain domain instead of at the organisational level by the supply chain teams and logistics team, as a sub-domain creates data products relating to logistics based on the data integrated by Supply chain teams.
The script to create the above deployment model is given below
use role accountadmin;
EXECUTE IMMEDIATE FROM '@"STAGE"/template.sql'
USING (model=> 'type2', environments => 'QA', sourcenames => 'SAP',coarsegraindom=> 'SUPPLYCHAIN',domains=> 'LOGISTICS', dpnames => 'ROUTEOPTIMISATION' , createwarehousesforsources =>'',createwarehousesfordp=>'');
Type 3 — Lets switch up the industry a little for this. Consider an organisation with a manufacturing division. The manufacturing domain wants to create a workflow optimisation data product taking OEE source data as well as bottleneck data from their source system. However, the manufacturing team lacks experience in integrating with source systems. Here the central team have a team of experts in integrating and ingesting SAP systems in addition to a team of engineers well versed in data ingestion from all sources within their organisation who will be responsible for ingesting the raw source data and making it available to the manufacturing and other domains in their org. The manufacturing team, being experts on the business logic of their data, integrate the data from different source systems in a meaningful curated layer and then create the workflowoptimisation data product.
The script to create the above deployment model is given below
use role accountadmin;
EXECUTE IMMEDIATE FROM '@"STAGE"/template.sql'
USING (model=> 'type3', environments => 'PROD1', sourcenames => 'OEE;SAP',domains=> 'MANUFACTURING', dpnames => 'WORKFLOWOPT' , createwarehousesforsources =>'SAP',createwarehousesfordp=>'');
Type 4 — Consider a similar organisation as described for Type 3. Now the manufacturing domain wants to create a predictive maintenance data product which needs IOT data. Since this is a specialised data source that is more extensively used in the Manufacturing domain than rest of the org, knowledge on working with the IOT data and ingesting it optimally exists solely within the Manufacturing domain. Also this data is not globally relevant for other domains in the raw format. So it makes sense for the manufacturing domain to ingest, integrate and transform the sensor data in a meaningful curated layer themselves, before going on to build a predictive maintenance data product
use role accountadmin;
EXECUTE IMMEDIATE FROM '@"STAGE"/template.sql'
USING (model=> 'type4', environments => 'DEV', sourcenames => 'SALESFORCE',domains=> 'RECRUITMENT', dpnames => 'TURNOVER' , createwarehousesforsources =>'SALESFORCE',createwarehousesfordp=>'');
Additionally Snowflake’s support for Git repository provides the unique advantage to easily view the template’s contents, keep an audit trail and make modifications in a structured process.
Summary
Federated architecture is a significant development in database design, crucial for managing distributed data systems in today’s interconnected world. It offers businesses the flexibility to handle vast amounts of data, enhancing decision-making and operational agility.
Deployment model is one of many architectural components that can help influence the level of federation in your organisations. This article explores how Snowflake makes it easier with features like JINJA templating and Git integration to quickly implement and support varying degrees of federation within your organisation.
I hope you got some insights into implementing data mesh or federated architecture in your organisation and what Snowflake features can help you get there.