Data Pipelines with Snowflake

John Ryan
4 min readJul 10, 2020
Photo by JJ Ying on Unsplash

Building real-time data pipelines are hard. Data needs to be captured, transformed, and analyzed within minutes. The Snowflake database simplifies the process of building a data pipeline down to just three SQL statements. This article explains how.

The Challenges

The diagram below highlights the main steps, which include ingestion, transformation, storing, and analyzing the results. One of the most common solutions involves using Spark Streaming, but this requires in-depth technical expertise in Java and Scala.

The Architectural Requirements

A poorly designed solution can be quickly overloaded by a sudden increase in velocity in the data pipeline. Likewise, the answer must handle semi-structured data (e.g., JSON) and provide a resilient scale-out solution with automatic hardware fail-over for resilience. The main architectural requirements include:

  • High-velocity Capture: The solution must handle high-velocity data streams from multiple sources.
  • Message Queuing: To avoid being overloaded by a sudden increase in data volume, the solution needs a resilient scale-out message…

--

--