Live Syncs on Snowflake: Real-time data on your current infrastructure

Luke Ambrosetti
Snowflake
Published in
5 min readMar 27, 2024

--

This is post was co-authored with Sean Lynch from the Census team.

Late last year, Census introduced Live Syncs, which enabled real-time Reverse ETL on top of the data warehouse or data platform for the first time ever. We expanded that to support Live Syncs from Confluent, Apache Kafka, and Materialize and our customers are now using Live Syncs to power a number of interesting business applications in real-time. The feedback that we heard again and again from our users was that they didn’t want to have to maintain these new real-time systems.

Today, we’re bringing real-time to everyone, releasing our Live Syncs on the Snowflake Data Cloud. They are powered by Snowflake Streams, but Census takes care of all that for you. If you’re on Snowflake, you now have everything you need to get started with Census Live Syncs. Just create a sync from any compatible Snowflake Table, View, or Dynamic Table, select the Live option, and start syncing data to over 200 business tools with sub-second latency.

Compared to a continuous sync, we’ve found that Live Syncs can be 100+ times faster at activating data and 100+ times cheaper to operate. Want to know more about how we got to those numbers? Read on.

Saying Hello in Real Time

We developed Live Syncs to enable customers who want to use Census for high-speed use cases such as next-page personalization or real-time recommendations while incorporating customer profile data available in their Snowflake Data Cloud.

As an example, one of our B2B customers planned to deliver personalized in-app content for new signups. They had considered using a continuous sync but their SQL transformation was costly to run as it kept their warehouse online 24/7. In this case, the business had a relatively low rate of new signups (~20 per day), so this all felt like the wrong solution.

We needed to solve three distinct problems:

  1. Decreasing the event ingestion latency
  2. Decreasing the data modeling latency
  3. Reducing the compute cost of identifying new sign ups quickly

Thankfully, Census and Snowflake have both been developing new innovations to help us get closer than ever to true streaming pipelines on the Data Cloud. Let’s dive in to see how Census Live Syncs and new features from Snowflake work together to reduce pipeline runtime.

Comparing data pipelines before and after Census Live Syncs on Snowflake

Near Real-Time: Live Syncs on Dynamic Tables

Previously, this customer had been using an hourly transformation process to create an output-ready table, but this would have introduced unacceptable lag to the pipeline. Instead, they replaced the separate build step with a Dynamic Table. A Dynamic Table is Snowflake’s native functionality to process incremental data given a target lag, which takes care of inserting or updating data from a source table without rebuilding the entire target table. Since the volume of new events is so low, a Dynamic Table is an incredibly efficient solution as it’ll only recompute if new data is detected upstream.

Census adds a Snowflake Stream on top of the Dynamic Table to take transformed events and get them to their destination in near real-time.

Real-Time: Live Syncs on Snowflake Streams

In a traditional Reverse ETL sync, detecting what needs to be sent to the destination is performed by executing a query against the source data platform and comparing the result of that query against a snapshot of the previous results. This takes some time to compute and is particularly expensive if the dataset doesn’t change much.

While a Dynamic Table now allows a customer to inject transformation steps in their activation process, Census can also operate off a Snowflake Stream directly. This can be incredibly powerful if you’re ingesting events from Snowpipe Streaming directly into Snowflake.

Snowflake Streams allow us to get a log of changes on a table and read from that log efficiently. When you create a Census Live Sync, we automatically create the corresponding Snowflake Stream under the covers and routinely (every few seconds) check it for new data by using system functions that do not wake up your data platform or use any Snowflake virtual warehouse credits. With our example of 1 event / hour, this means that a Live Sync would consume ½ of a Snowflake virtual warehouse credit / day as compared to a continuous sync which would keep the data platform on at all times and consume a full 24 credits per day.

Inside Census, setting up a Live Sync is simply a matter of using the new Live Run Mode. Streams are supported on both standard Snowflake tables as well as dynamic tables. Using either as a source will give you the ability to use Live as the run mode. Census will then take a minute to set up the Snowflake Stream automatically, and your sync will immediately start activating data in real time to its destination!

What’s Next

Today, Live Syncs are a special mode of working with Snowflake and they do require a little additional thought about how data will flow end to end. But data velocity in the Data Cloud continues to increase, thanks to Dynamic Tables, Streams, and Snowpipe Streaming. Data teams can soon expect that the data powering their analytics, models, and activations will be truly up to date.

In fact, we imagine this will be how most syncs operate in the future. Why not? It’s cheaper to run and faster after all. These capabilities are available to every Snowflake customer where their data is already stored, without any additional need to deploy infrastructure. At Census, we’re excited to be working with Snowflake to make streaming data something that every business can expect.

Census Live Syncs on Snowflake are available today. Get a product onboarding or start a free 14-day trial now, and be on the lookout for a quickstart guide from Census + Snowflake.

--

--

Luke Ambrosetti
Snowflake

Partner Solutions Engineer @ Snowflake. data apps + martech. sweet tea and fried chicken connoisseur. drummer’s syndrome survivor.