How to Build and Manage Snowflake Advanced Functionality in Coalesce
Building and maintaining data transformations and data models has long been done via manual processes, often becoming a bottleneck in the analytics supply chain. Using Snowflake as your data platform can help alleviate some of this friction, but the challenges of building, managing, and scaling your data transformation layer don’t immediately disappear.
For that you can turn to a platform such as Coalesce, which provides a best-in-class solution for developing and managing your data transformation layer on Snowflake. Coalesce features an easy-to-use UI which employs a code-first approach, allowing any data practitioner to easily leverage all of the incredible functionality that Snowflake has to offer.
Coalesce amplifies your data transformation layer on Snowflake by helping you to seamlessly build and manage data projects of any scale, using any Snowflake feature you like. In this post, we’ll illustrate how to build a data pipeline that allows you to ingest and infer the schema for files that exist in Snowflake external stages, leverage Coalesce to transform the data, leverage Snowflake Cortex LLM functions without writing code, and output the final table as a Snowflake managed Iceberg table — all while managing the entire process through Coalesce.
The primary building block of the Coalesce platform is a Node, which is a user-defined standardization of an object or action in Snowflake. These Nodes can either be created by users, or can be installed into Coalesce as a Package, which you can download via Coalesce Marketplace. One of the available Packages is the External Data Package, which allows us to read in a CSV file for our pipeline and infer the schema of the file.
With the External Data Package installed, we can find the stage in Snowflake that contains the file(s) we want to load as a table and infer the schema for. In this case, I have a file called call_transcripts that exists in an external stage. I can use the Inferschema Node to determine the schema of the entire file without having to manually inspect the file or write cumbersome DDL (data definition language).
By adding the Inferschema Node to my Workspace, I can now configure the options for where the stage is located, the name of the stage, and the file I want to determine the schema for. And just like that, I can immediately create a blank table in Snowflake that contains the schema for this file.
With the schema determined, I can now load the data from the stage into this new table with the schema. Using another Node from the External Data Package, I can use the Copy Into Node to seamlessly copy the data from the file in my Snowflake stage to a table in Snowflake. This is all done without writing a single line of code.
With our table in Snowflake, we can now use additional Nodes in Coalesce to build out our data pipeline. As you can see, so far we’ve been able to work with data in a Snowflake stage, infer the schema, and populate the table without having to write complex code, and we’re able to seamlessly update and manage all of our objects. Next, we will use stage Nodes to build a small staging layer to help process the data we’ve loaded.
The data set we’ve loaded contains call transcripts from a fictional call center. The call transcripts are in multiple languages: French, German, and English. Additionally, the call center reps are required to ask each caller their first and last name. The goal of our data pipeline is to translate the transcripts into a common language — in this case, English — and extract the customer name from the call transcript while associating a sentiment score with each customer based on the conversation.
To do this, we will split out each language’s transcripts into their own staging Nodes or tables. This allows us to isolate each language to a table, and use Snowflake Cortex Nodes in Coalesce to translate the data. In this case, we’ll have Nodes for French and German data, as the English data in the data set does not need translating. We’ll include a filter within each stage Node for the language the Node contains.
With the staging layer complete, we can add in Snowflake Cortex Nodes from the Cortex package installed from Coalesce Marketplace. This package allows users to use Cortex LLM functions without having to write or understand the code or syntax of the functions. We’ll add a Cortex Functions Node to each stage Node. Once we’ve done this, in the configuration settings we can see all of the options available to use within the Node. We’ll toggle on translate and pass through the column we want to translate — in this case, call transcripts. We’ll also supply the language we want to translate from and to.
Once this is done for both of the stage Nodes, we need to unify the data sets so the German and French translated data can be in the same table. We’ll add one additional stage Node to do this. Through the Node interface, we can easily set the multi-source strategy as a Union and use drag and drop to union the tables together.
With the data sets unified, we can now extract the customer name and create a sentiment score based on the transcript of each call. We’ll add one more Cortex Function Node to the pipeline. This time, we’ll toggle on both Sentiment and Extract Answer. We’ll also duplicate the call transcripts column as we need to supply the column to two different functions.
For the sentiment function, we’ll simply pass through our first column. For the Extract Answer function, we’ll supply the second column and write in the question we want an answer to: “who is the customer?”
We can create this table in Snowflake, and without writing a single line of code, we’ve been able to translate our data and also provide a sentiment score with the associated customer name as an array.
As the last step in this pipeline, we want to expose the data we’ve processed to any system within our organization that may need it. We can do that easily using the Iceberg Package from Coalesce Marketplace. We’ll add one last Node to this pipeline: the Snowflake Iceberg Table Node.
Within the Node, you’ll notice we still have our customer name as an array value. Coalesce has a one-click JSON parser that can instantly split out the values of the array. By using the parser, we can obtain just the text value for the customer name as its own column. We’ll also need to remove the transcript_customer column as Iceberg tables don’t like array data types.
Now that we’ve done that, we can supply the External Volume name of the object store that we’ve configured in Snowflake, and pass the name of it here.
Finally, we need to provide a base location name. This will be the name of the base folder where the Iceberg table will exist in your object storage, which in this case is S3.
In a matter of minutes, we were able to ingest a file from a stage, infer the schema, process the data for analytic use cases, translate the data, extract information while providing a sentiment score, and output the data as an Iceberg table. This was all done while writing minimal code, while also providing users with complete control to build, manage, and scale any data project.
Coalesce allows users of any technical skill level to take advantage of their data. I hope you enjoyed this step-by-step guide showing you how easy it is to leverage a wide range of Snowflake functionality–-from ingestion to exposing results in Iceberg table format. If you’re ready to build and manage your data projects on Snowflake, try out Coalesce for yourself. For more information, check out these additional resources:
- What is Coalesce? [video]
- Learn the basics of Coalesce [virtual hands-on lab]