Understanding the Power of Integrating with Snowflake via Query Push Down
This post was co-authored with Florian Delval (ActionIQ).
The Rise of Data Cloud
Gaining access to a 360-degree view of the customer has been a long-standing goal for marketers. In the past, organizations invested significant internal resources in standing up on-premise data warehouses to achieve this. However, those systems were extremely complex to set up and could not scale properly to serve the workloads required for marketers’ specific use cases.
During the initial era of customer data platforms (CDPs), vendors had no choice but to build their own infrastructure to host the customer 360. While this reduced friction and made it easier for marketers to access customer data, it also led to the creation of new data silos and data governance remained a challenge.
The advent of the Snowflake Data Cloud changed the landscape. Snowflake has significantly lowered the barrier for organizations to have ownership of their data and establish a centralized Customer 360 view. Through data collaboration and data applications, Snowflake seamlessly integrates with CDP partners. Storage and compute are separated, ensuring that both internal workloads and partner applications can operate on the same data without any infrastructure limitations.
The paradigm of “moving the data to the application first” is now shifting to “moving the application to the data,” enabling a true zero copy data architecture for the first time.
Why does “zero data copy” matter?
The move towards a zero data copy architecture is going to have a positive impact across the organization, with three main benefits:
- Control: As data gets copied, its control gets increasingly challenging. In a world where regulations — at the government, industry and technology levels — are evolving at a rapid pace, reducing copies is the only way to ensure proper governance and control.
- Scale: Snowflake Data Cloud provides an elastic scale where virtual compute clusters can be spun up and down based on the demand. Organizations no longer have to be limited by the robustness of a business application’s own infrastructure.
- Cost: Enterprise organizations are collecting vast volumes and data. While the cost of storing data could be considered reasonable, moving data is extremely expensive. Not only are there high operational costs to build and maintain data pipelines, but these movements are also compute-heavy.
Connected App — Integrating with Snowflake via Query Push Down
Snowflake has three main types of partner application integrations available today: dedicated partner connector, Secure Data Sharing (managed application), and direct push-down (connected application).
Snowflake is also working with partners to develop native applications — applications that run inside of Snowflake — as well as other exciting options to be announced at Snowflake Summit.
The managed app deployment model has received a lot of attention thanks to Snowflake’s data sharing capability. Today, we want to provide more information about an emerging and extremely powerful pattern: Connected Applications, built by enabling query pushdown to Snowflake. A connected application is when an application connects to a customer’s Snowflake account to process data, but doesn’t persist data outside of that account.
A Connected App with Snowflake: ActionIQ Example
Let’s take a look at how a Snowflake partner built a Connected App with Snowflake.
ActionIQ has powered customer experiences for enterprise organizations for the past few years, deployed as a bundled CDP. By bundled CDP, we mean that ActionIQ’s business applications could only run on its purpose-built infrastructure, after copying the data.
But the data copy requirement is no more. Thanks to ActionIQ’s HybridCompute technology, organizations can deploy a composable CDP in a data cloud native approach:
- Snowflake stores the data and is responsible for the compute.
- ActionIQ is offering a user-friendly application for marketing, marketing ops, advertisers and any other business team requiring access to customer data.
“The number of systems that have to keep up with each other and remain in sync goes down quite dramatically with a composable CDP.” — Nivy Swaminathan, Vice President of Advanced Analytics and Personalization, Saks
Five steps, that’s all you need to get started with ActionIQ’s composable CDP deployment on Snowflake:
- Data Readiness in Snowflake
- Connection Set Up
- SQL Generation
- Query Pushdown
- Surfacing Result in the App
Step 1 — Data Readiness in Snowflake
For a pushdown approach, you typically need data that is modeled and ready to query. There are different approaches to this, but the most common is to create one or many views, or even materialized views, that provide a wide, denormalized view of the data for use by the marketing team. Some organizations might be able to use just one view with all of the customer attributes, but as you increase complexity, performance may suffer with that approach. It’s important to understand what audiences the marketing team wants to create first, and then map out the best way to define the view(s) for them to use — any more than a handful and you’re probably doing something wrong!
For guidance on when to use a view or materialized view in Snowflake, the documentation on materialized views can help you get started.
Best Practice: Putting the work in up front to create a view for business users is crucial. It also can sometimes help to rename columns to be more descriptive, or even change dates into offsets (e.g. last_purchase_date becomes days_since_last_purchase).
Step 2 — Connection Set Up
ActionIQ provides a self-service interface to enter credentials to an organization’s Snowflake account. A Snowflake driver is used to discover metadata schemas which will be stored in ActionIQ. None of the actual data is going to be stored in ActionIQ, and only governed data can be discovered and accessed by the connected app.
Beyond the set up of credentials — the easy part we could say — a strong integration includes additional elements such as alerts and monitoring.
Best Practice: Monitor and test regularly multiple components of the integration, such as the schema integrity and the schema and data access. Shall an issue arise, it will accelerate the discovery and resolution to minimize any user experience disruption.
Step 3 — SQL Generation
A business user can take a multitude of actions via ActionIQ’s UI: generate insights on customer data, build an audience of customers, activate records to another system, etc.
Whichever action is taken, the connected app will translate the request into a SQL query to be executed in Snowflake.
Best Practice: Providing a no-code interface experience is necessary to democratize data access within an organization. The connected app should prefer the generation of a SQL query from a user action rather than forcing users to directly type in a Snowflake query.
Step 4 — Query Pushdown
Snowflake offers a wide range of options to pushdown queries to the Data Cloud, either directly via SQL, or through Snowpark using Python, Java, or Scala. ActionIQ has abstracted the SQL generation required to build audiences and pushes that down to Snowflake. After data is extracted, it then goes through optional further transformation steps (as designed by the customer) in-flight to the specified destination(s).
For any application that provides pushdown, it’s always a good idea for an administrator to use a dedicated virtual warehouse for that app during installation, as well as a resource monitor, to directly track and monitor consumption of the app. You may also want to limit the query timeout, but understand that this could introduce risk for job/process failure within the app.
Best Practice: For any connected app that is connecting into Snowflake, use a dedicated user, role, virtual warehouse, and database (if applicable). Pairing a resource monitor with the dedicated virtual warehouse will help make sure that a simple mistake doesn’t turn into a costly one.
Step 5 — Surfacing Result in the App
Once the query execution is completed in Snowflake, the result is returned to the connect app who can surface it in the UI to the user. It’s important to note that only the data necessary to answer a user action is being returned.
ActionIQ will only cache data temporarily in strategic use cases such as real-time where low latency and high concurrency is critical. However, in other scenarios — such as an audience count — the result won’t be stored for longer than the job execution, typically minutes.
Best Practice: Data returned to the connect app should only be cached for as long as it’s required to perform its purpose. Do not persist data beyond the necessary window.
Start integrating with Snowflake via Query Pushdown Today
While nascent at this time, the deployment of composable CDPs will continue to develop rapidly.
By pushing queries down to Snowflake Data Cloud, solutions like ActionIQ are able to eliminate the need for data copies and benefit from a proven scalable infrastructure to support marketing workloads.
Interested to learn more about a Composable CDP can help you scale your customer data operations? Reach out to the ActionIQ team.
Interested to learn more about how Snowflake can help with your marketing efforts? Read about our use cases.