Snowflake and Salesforce, a match made in the cloud

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

10 min readFeb 26, 2019

Update July 2020: The prototype originally described in this article has been developed into a commercial offering on the Salesforce AppExchange, see https://omnata.com

The below article shows the original setup experience, which has now been replaced with a much faster and friendlier setup wizard.

Salesforce is arguably the most successful SaaS company around, having amassed over 150,000 customers as it approaches its twentieth year in business.

In addition to the huge amount of out-of-the-box Sales and Service features, it is heavily customizable and has a vast ecosystem of extensions and integrations on the Salesforce AppExchange.

For many businesses around the world, Salesforce is trusted to manage their core customer, accounts, and service-related data sets. However, as a CRM it’s not the ideal destination for all your other data. Big Objects are supported (at a cost), but thankfully Salesforce is designed to be able to access data from just about any other system.

Snowflake is a newer kid on the SaaS block, and is the first data warehouse properly designed for the cloud.

Its unique architecture (separate storage and compute running on public cloud infrastructure), enables:
- Cost optimization, paying commodity prices for storage and paying only for the compute you use
- Zero management, it truly does everything for you out of the box, including auto-pause and auto-resume
- Scalability, performing well on data sets both large and small (you don’t get the query overhead that the MPP databases incur)
- High concurrency, you can throw a large number of users at it, and also isolate workloads
- Data sharing, providing direct access to your data sets to other accounts, so that you aren’t sending files around

In this article, we will be leveraging many of these strengths of Snowflake to quickly, easily and cost effectively surface a large set of data to a Salesforce user.

Our Scenario

Imagine you have been collecting web traffic logs from your company’s website as JSON objects in S3 or Azure blob storage. Imagine also that you’ve discovered Snowflake, and in no time at all you have hoovered this data up into a table via an external stage (and maybe even set up continuous loading of new logs using Snowpipe).

Your analysts are happily connecting to Snowflake using R/Python/Tableau/whatever to analyse and visualize this data. But you’d also like your Salesforce users to be able to see what your customers have been doing on your website, in near-real time, without moving the data back out of Snowflake.

To do this, we will use a native connector I’ve put together in Apex: Snowflake Adapter for Salesforce Connect (hopefully listed soon on AppExchange). It’s free, but doesn’t include any support. You can use it to configure a Snowflake database as an external data source, making your data available throughout Salesforce. Too keep up, you’ll need some basic familiarity with Salesforce as there are too many mouse clicks to document in detail :)

The Snowflake Side

First, let’s take a quick count of the rows using a count statement.

As you can see, you’ve accumulated a modest 10M web hits.

Notice the 39ms it took to count the records — get used to being dazzled with low latency analytics in Snowflake. And to demonstrate that working with 10M records is effortless, during this entire exercise we’ll use an extra small sized warehouse, the cheapest one available.

The table consists of a single column of type VARIANT, which Snowflake uses to work with semi-structured data.

If you click on one of the rows, you can see the full JSON body.

So we have a CustomerID stored in a cookie, and this is what we’ll eventually use to link to an Account in Salesforce.

Then we have the basic elements of a web request — the elements of the URL accessed and information about the user’s device.

First, we want to extract these elements out into a table structure to work with. Here’s where Snowflake really pays for itself, by allowing you to easily drill down through the JSON in your SELECT statement.

Here’s the first Salesforce-specific requirement for external data. There must be a column named ExternalId, and it must contain a unique value for each record (basically acts as the primary key).

The requestId in the logs meets this requirement, so select it and a few other elements, and now we have a nice familiar looking table structure with new column names.

But Salesforce deals in Objects, not queries. So we will create a view using the above query. We’ll use the new materialized view feature, which can greatly improve the performance of querying semi-structured data.

Now we have our CUSTOMER_WEB_LOGS view that we can tell Salesforce about.

But first, let’s just see who our most web-heavy customers are:

The Salesforce Side Part 1: External Data Source

For this part, you’ll need to be bestowed with the necessary privileges in your Salesforce account — in particular the ability to create an External Data Source and modify object layouts.

We’ll start with the External Data Source, access this from the setup console.

We need to provide the full URL to our Snowflake instance, as well as the database, schema, role and warehouse as query string parameters.

In addition to this, we use the Named Principal Identity Type (this means every Salesforce user will connect via these credentials).

Currently, the connector only supports Snowflake authentication (so no SSO). We provide the username and password by selecting the Password Authentication protocol.

Salesforce wisely prevents extensions from accessing whatever external sites they please, so we must add our Snowflake instance to the Remote Sites list.

If Snowflake IP whitelisting was in effect, we’d also need to add the Salesforce source IP addresses for our region or the connection will be blocked at the Snowflake side.

Once the External Data Source is added, we click “Validate and Sync”, and wait.

Now we see our table, we select it and hit “Sync” to grab the metadata (all the columns and datatypes).

After landing at the Details page for the new Object, we select these two options so that we can view the data in reports.

Click on “Fields & Relationships” and voila! There are the columns from the Snowflake view, along with their data types.

Here we notice the second Salesforce-specific requirement for external data. There must be a column named “Display URL”, and it must be of type “URL”. The idea here was that Salesforce can offer a standard way for a user to navigate out to another SaaS application or website and land straight on the correct record.

Normally useful, but does not apply to our Snowflake scenario. So the Snowflake connector conveniently adds it in automatically with a dummy value, and we will just hide it from view in Salesforce.

At this point, our Snowflake view is now a first class citizen in our Salesforce organisation.

You can go straight to the data by using the Report Builder.

The Salesforce Side Part 2: Object Linking

Now that our web traffic is accessible in Salesforce, we want to link it up to the existing data so that it can be shown conveniently where it belongs, alongside everything else.

Back in our CUSTOMER_WEB_LOGS External Object, we need to change the CUSTOMER_ID from being just a boring text field, and promote it to being a link to another Object.

So we edit it…

… and change the field type…

…To an Indirect Lookup Relationship.

Relate it to the Account Object, and the Customer_ID__c field:

then accept the defaults in Step 4 and 5.

In Step 6, rename the Related List Label to something friendlier:

much less shouty than the default

Congratulations CUSTOMER_ID field, your data type is now “Indirect Lookup(Account)”

The Salesforce Side Part 3: Show The Users

The Object links have been established, now for the final part: Showing a customer’s web hits while viewing their account.

In the Object Manager (not the external one) for the Account object, we go to “Page Layouts” and click on the Account Layout.

Now we’re in one of the older parts of Salesforce town - the layout editor itself. Best to just do what you need to do, avoid direct eye contact and leave.

Under “Related Lists”, we grab the “Web Logs” list and drag it down somewhere amongst the existing lists.

Once that’s done, we click on the little wrench icon to edit the list.

All the useful fields are added in to the list.

Remember we said we’d remove the “Display URL” field as it’s not relevant? Here’s where we do that.

This will pop up, letting us choose whether to force the new view upon others.

Now we’re ready to view the data! Ideally in a real world scenario, there’s already some common identifier in both Salesforce and Snowflake to link on. But the Salesforce sample database doesn’t provide any Customer IDs, and the ones I generated in Snowflake are just random numbers.

So just to make this demo work, I’ll just edit the “Dickenson plc” account to have a customer number that exists in my web data. We’ll link it to that busy customer we found in our Snowflake query earlier.

Now when we open the Dickenson account, we see their web hits on the default “Related” tab:

If we click “View All”, we see a fuller list in its own tab:

And finally, by clicking on an individual web hit, we can view it alone:

And there we have it! Snowflake and Salesforce fully integrated. Both SaaS products with no infrastructure to manage, both have 30 day free trials, and both enable you to start out small and scale with your business. Don’t forget to configure auto-pausing your Snowflake warehouse, so that when Salesforce users are not viewing the data, you’re not paying for a running warehouse - it will even seamlessly auto-resume!

Postscript — less is more

We’ve demonstrated a simple integration, but the data is a bit busy in its raw form. Seeing a customer’s individual web hits is too much information as a starting point.

Lets give the user an aggregated view first, rolling the web hits up into sessions that they can optionally drill down on.

We’ll use another materialized view over the same source JSON table:

create or replace materialized view CUSTOMER_WEB_SESSIONS 
as(
select LOG_ENTRY:Cookies:SessionId::varchar(255) as "ExternalId",
 LOG_ENTRY:Cookies:CustomerId::number as CUSTOMER_ID,
 count(*) as PAGE_HITS,
 min(LOG_ENTRY:timestamp::timestamp) as FIRST_HIT,
 max(LOG_ENTRY:timestamp::timestamp) as LAST_HIT
from WEB_LOGS
group by "ExternalId",CUSTOMER_ID
);

resync our schema in Salesforce, and sync the new table schema:

Link the CUSTOMER_WEB_LOGS to CUSTOMER_WEB_SESSIONS by Session ID (for brevity, that’s an exercise for the reader).

Now, instead of a huge list of web hits, we have a much smaller list of web sessions:

Which, when you drill down into a single one,

it shows the web hits on the “Related” tab:

What else?

The possibilities are endless!

Even within the web logs scenario, you could extract product ids from the URL and link them to products in Salesforce. Then, when viewing each product in Salesforce you could see information about who’s browsing them on your site. You could even give journey mapping information and checkout conversion rates.

This looks great! How do I set this up in my Salesforce organisation?

As of July 2020, this is now available on the Salesforce AppExchange, with support/onboarding assistance available:

https://appexchange.salesforce.com/appxListingDetail?listingId=a0N3u00000MBcLmEAL