Transforming CSV Files During Data Loading in Snowflake

I was recently analyzing data from Salesforce to better understand our win/loss conversion rates depending on size of business, lead source channel, rep assignment, and a variety of other factors. I wanted to be able to slice and dice the data as needed, combining information about opportunity history, opportunity details, and account data. Although I could natively use Salesforce for some of this analysis, I was quickly losing track of my experiments as I performed the data exploration. I felt a need to create aggregate reports in a Google Sheet to keep track of the data analysis, which felt increasingly counterproductive and labor intensive.

That’s when I switched to Snowflake.

I moved my analysis from Salesforce to Snowflake, importing the data into Snowflake tables. This allowed me to easily query the data and track my experiments through query history.

My data wasn’t large; I just needed a simple way to import CSV files from Salesforce into Snowflake. During this process, I encountered times when I had to adjust the CSV data to match my table structures.

This blog post outlines how I imported the data into Snowflake and used its COPY command and transformation functions to easily adjust the data during the import process.

Importing a CSV File into Snowflake

There are several ways to import a CSV file into Snowflake. You could write code to do this, if you need to do continuous loading, you might want to use Snowpipe.

For my project, I really just needed a one time import of a relatively small dataset. I was looking for simplicity.

To support my workflow, I used an internal stage for the CSV file and then relied on the COPY INTO <table> SQL command to move the data from the CSV file into a table.

Copying a CSV File into Snowflake

A Snowflake stage is used for loading files into Snowflake tables or unloading data from tables into files. An internal stage stores data within Snowflake, while an external stage references data in a location outside of Snowflake like a S3 bucket.

For my use case, I used Snowsql to move the file from my local machine into an internal Snowflake stage, then copy the file into a table, and execute the SQL against the table. The code to perform these steps are shown below. In this example, I’m creating a table to track the Salesforce opportunity history.

CREATE OR REPLACE FILE FORMAT sfdc_format
TYPE = 'CSV'
FIELD_DELIMITER = ',';

CREATE OR REPLACE STAGE sfdc_stage
FILE_FORMAT = sfdc_format;

PUT file:///FILE_LOCATION/report1709435152856.csv @sfdc_stage
AUTO_COMPRESS=TRUE PARALLEL=20;

CREATE OR REPLACE TABLE opportunity_history (
opportunity_name VARCHAR(255),
amount NUMBER,
probability NUMBER,
forecast_category VARCHAR(50),
ready_to_close NUMBER,
from_stage VARCHAR(50),
to_stage VARCHAR(50),
owner VARCHAR(50),
lead_source VARCHAR(50),
created_date DATETIME
);

COPY INTO opportunity_history
FROM @sfdc_stage
FILE_FORMAT = (FORMAT_NAME = sfdc_format)
ON_ERROR = 'skip_file';

This works well assuming the data in the CSV file maps perfectly to the table structure. However, as inevitably happens when working with data, reality doesn’t perfectly match the whiteboard solution. Within the Salesforce data there’s situations where a numeric field is missing a value or you could have a numeric value that’s better served as a string for clarity purposes.

In the next section, I walk through how I addressed these issues.

Transforming the Data During Load

Although I could write Python to handle the data clean up, it felt a little heavy handed given the limited set of data transformations I needed. Additionally, I really wanted to focus on getting to the analysis work, not spending a lot of cycles on the data pipeline.

Fortunately, Snowflake supports a long list of transformation functions to parse, split, convert, and cast data as needed during the copy table call.

Transforming CSV Data While Loading into Snowflake

The first issue I needed to fix was that my import was failing because some of the numeric values were empty strings in the CSV file (see image below).

Example CSV Import Error

I wanted to convert empty string values to zero.

To do this, I placed a iff statement around each of the numeric fields. Iff is a single-level if-then-else expression. The format is an if condition followed by a value for if the condition is true and a value if the condition is false.

Using the iff function to convert empty strings into zeroes, my copy into statement changed to the one below.

COPY INTO opportunity_history
FROM (
SELECT
t.$1,
iff(t.$2 = '', 0, t.$2),
iff(t.$3 = '', 0, t.$3),
t.$4,
iff(t.$5 = '', 0, t.$5),
t.$6,
t.$7,
t.$8,
t.$9,
t.$10
FROM @sfdc_stage t)
FILE_FORMAT = (FORMAT_NAME = sfdc_format)
ON_ERROR = 'skip_file';

With this code in place, I was able to import the CSV file into the opportunity_history table.

Example CSV Import Success

However, I wanted to improve the structure a bit.

The ready_to_close value in Salesforce was a numeric value of either 0 or 1, where 1 meant the deal was closed and won, and zero meant closed and lost. I decided it would be nicer to convert these into VARCHAR values of Won and Lost.

To support this requirement as an in query transformation, I used the case statement as shown below.

COPY INTO opportunity_history
FROM (
SELECT
t.$1,
iff(t.$2 = '', 0, t.$2),
iff(t.$3 = '', 0, t.$3),
t.$4,
CASE
WHEN t.$5='1.00' THEN 'Won'
WHEN t.$5='0.00' THEN 'Lost'
ELSE 'Other'
END,
t.$6,
t.$7,
t.$8,
t.$9,
t.$10
FROM @sfdc_stage t)
FILE_FORMAT = (FORMAT_NAME = sfdc_format)
ON_ERROR = 'skip_file';

With these simple adjustments, I now had an easy to use ingestion pipeline for my Salesforce data, freeing me up to spend time on the analysis work.

Wrap Up

This article shows how easy it is to import and transform data in Snowflake without needing extra tools or writing much code.

Engineers often prefer the newest tools, but we sometimes make things more complicated than they need to be. Many data issues don’t require a complex set of tools and code. Thankfully, Snowflake offers features that handle both straightforward tasks and more challenging scenarios with intricate data transformations.

Our task is to choose the right tool for the job. Do you need a heavy-duty solution, or will a simpler approach do? Making the best choice depends on experience and evaluating the trade-offs, so choose wisely.

--

--

Sean Falconer
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Head of Developer Relations and Marketing @ Skyflow | Engineer & Storyteller | 100% Canadian 🇨🇦 | Snowflake Data Superhero ❄️ | AWS Community Builder