DevOps On Snowflake Using DCM Tools — Saama Analytics

Saama Analytics
Saama Analytics
Published in
6 min readMar 26, 2021

Database change management is not an easy task. It is even at the beginning of a project where the ER model is constantly changing. As a team, we have to be extra careful about deployment automation, because without automation we can not have absolute continuous integration and delivery. How do you manage snowflake objects(tables, views, stored procedures, and so on) as of now? How does the team get database schema changes live on the production system? As your team works on changes to the database to support your business-critical applications, how do updates get through dev environments and QA, hopefully through pre-production and eventually to production in a controlled, reliable and repeatable way? Are you using any tools to help you with automation and DevOps? If your answer is no, you’re not alone. How to perform DevOps on Snowflake is one of the most frequently asked questions, once the organization is in the evaluation or implementation stage.

There are several reasons, why a managed database change management process is crucial for organizations let’s look at a few most important ones:

  1. Frequency of releases: In today’s world getting there first is crucial to success and that means catering to new features, bug fixes to be resolved and released as soon as possible. Moreover, there is a product roadmap you and your team cannot miss.
  2. Auditing and compliance: With SOX and HIPAA regulations, GDPR and CCPA are two compliance frameworks which product teams have to now inculcate and adhere to as part of their roadmap. This means the required processes for database change management are to be adopted including checks for managing schema changes and introducing other mechanisms that provide an audit trail of changes.

How Database DevOps Helps?

DevOps today is to automate operations and security aspects of a solution to enable engineers to use the same methodologies as developers for automation. DevOps is key to enabling agile software delivery and therefore the following business results:

  • Faster delivery of software
  • Greater efficiency
  • Repeatable, measurable processes
  • Improved quality of software
  • Faster innovation
  • Appropriately matching solution to a problem statement
  • Economical software delivery
  • Greater ROI

However, DevOps for data warehousing is critical and has some exceptional challenges in contrast with DevOps for applications:

Source: Continuous Deployment Of Databases: Part 1

SCHEMA CHANGE MANAGEMENT

Foundational to any release automation for databases may be a schema change management (or schema migration) tool. Achieving database DevOps needs managing database objects in code and deployment automation.

Declarative and Imperative can be two approaches to schema change management. Here’s a breakdown of the two:

The ideal solution

After working with both approaches, I prefer the declarative style, but it requires a complicated schema comparison tool. The current marketplace for declarative-style tools for Snowflake is little but growing quickly. Here is the list of Snowflake objects which must be managed during the development lifecycle

I think we should always stick with equivalent rules just like when we manage code changes. This essentially implies that we should always implement subsequent practices:

Everything should be in the source control

Every developer/Support engineer should be able to run migrations — Every developer should have a local database in his own environment and should always have an option to upgrade his database after downloading the changes.

Easy conflict resolution — Conflicts on the same database objects should be easy to unravel and should not require tons of labour.

Versioning — It is imperative that we have clear trackers on our database current version and status check on migrations that have been deployed and those which are due.

Deployment automation — Our mechanism should enable us to automate the deployment of changes to tests and production environments. We should leverage the same mechanism during development to ensure everything works as expected.

The possibility of manual deployment — From time to time it might happen that changes can only be deployed manually requiring human intervention, thanks to procedures or regulations in an organisation setup. This is why we should be able to generate a set of scripts for manual deployment.

The ability to run “pre” and “post” deployment scripts It is customary to execute some logic before or/and after deployment. Every so often we have to regenerate some data, on occasions we need to check particulars (constraints integrity for example). This feature comes in very handy.

IDE support — It would be great to possess the full support of an IDE. Quick access to objects schema without connection to the database, prompts, compile-time checking — this would have a great impact on the overall productivity of the team during database development.

Introducing SnowChange:

Let’s take a look at DCM tools for Snowflake

Snowchange is a lightweight Python-based tool for managing all Snowflake objects. It is in accordance with the imperative-style approach to database change management(DCM) and is inspired by the Flyway Database Migration tool. When Snowchange is combined with a versioning system and a CI/CD tool, database changes are often approved and deployed through a pipeline using modern delivery practice.

Here is a very high-level overview of how snowchange works. For a detailed look, check out the snowchange GitHub repository .

A snowchange change script is a file that contains one or more SQL statements. Each change script name ought to have a unique version number (for example, V1.1__initial_database_objects.sql ) to ensure the scripts are run in the correct order against a target Snowflake database. Change scripts are stored in a flexible directory structure.

Snowchange loops through each database folder and applies any missing changes to the target Snowflake database, when it’s run. Each target database state is tracked in a special change history table named SNOWCHANGE.CHANGE_HISTORY (which is automatically created and maintained by snowchange). It runs scripts only against the target database with a version number greater than the max version stored within the change history table. This is how only “new” scripts are applied to the target database every time snowchange runs.

As snowchange is written in Python it can be easily included in a CI/CD pipeline. All CI/CD tools provide hosted agents (or runners) with Python installed, hence there is no requirement to create a self-hosted agent or container to run snowchange!

Conclusion

A DCM tool is essential for managing Snowflake objects and implementing CI/CD pipelines.

Author: Koustubh Dhopade
Data profiler & enthusiastic learner with over 16 years of IT industry experience which includes extensive knowledge in design, development and business with multiple technologies in Data Management space. Very result-oriented, with excellent ability in data modelling, data design and data architecture implementations with great exposure to Banking, Logistics, Insurance and Hi-tech domain.

Originally published at https://saamaanalytics.com on March 26, 2021.

--

--