Photo by Maria Orlova from Pexels

How to Automate Data Pipelines with Snowflake Streams and Tasks

David Oyegoke
Slalom Data & AI
Published in
3 min readOct 5, 2020

--

This is the fifth blog in Slalom’s Snowflake series.

In our last blog, we looked at how Snowflake can help organisations to secure their data.

In this post, we explore how to automate data pipelines with Snowflake Streams and Tasks.

Snowflake’s Cloud Data Platform provides an excellent, scalable data processing environment.

Snowflake Streams and Tasks can be used to create and automate data transformation pipelines.

Streams

A Snowflake table stream is powered by the time travel feature. It can be queried like a table and contains the latest version of each row changed since the last time the stream was consumed by a transaction.

The stream offset is moved forward automatically once data is consumed. It also provides change metadata along with each row.

Tasks

Snowflake tasks are like cron jobs which can execute a SQL statement or run a stored procedure. It allows automating processing steps that are time-triggered and can be represented as a simple tree structure.

Bringing them together

The tasks and streams can be used together to create ELT pipelines.

Example use cases include:

  • Snowpipe ingests real-time data into a source table. A Snowflake stream defined on the source table keeps track of the changes. A Snowflake task reads the streams every few minutes to update the aggregate table which is read by a real-time dashboard.
  • A batch job makes changes to a ‘Customer’ table in the raw layer. The changed rows captured by the stream are then processed by a task to update the customer dimension in the data warehouse.

The main benefit of using Streams and Tasks together is that no external ELT tool is required for data processing and orchestration thereby reducing licence and infrastructure costs.

The challenge of data change capture, scheduling and orchestration

Implementing continuous data pipelines is very challenging. It involves tracking and capturing changes to tables (Change Data Capture) as well as orchestrating recurring schedules of data transformation.

Snowflake Solution with Streams and Tasks

Out of the box, Snowflake comes pre-built with Table Stream and Task.

  • Table Stream can be leveraged to capture changes made to tables, including inserts, updates and deletes as well as metadata about each change.
  • Task helps to schedule and orchestrate data transformation.

These in-built tools can bring about significant cost savings by reducing reliance on third-party tools.

Both streams and tasks can be used independently for other use cases.

For further information, refer to this Snowflake webinar which provides a demonstration of these features in more detail.

How Slalom can help

Slalom has the experience to help your business make the most of Snowflake and set the stage for long-term growth and sustainability.

Our Snowflake credentials include:

  • Snowflake Partner of the Year for three consecutive years — 2018, 2019 and 2020
  • Over 200 Snowflake certified consultants
  • Over 300 Snowflake projects delivered

In our next blog, we’ll explore data transformation in Snowflake with the Data Build Tool (DBT).

David Oyegoke is a Data & Analytics Consultant based in Slalom’s London, UK office. Ashish Billore is a Data & Analytics Solution Architect, also based in Slalom’s London, UK office.

Slalom is a modern consulting firm focused on strategy, technology, and business transformation.

--

--