Maintaining a Mapping Table with a Snowflake Native App

Leverage the Snowflake Native Application Framework to build an application that allows users to maintain a mapping table

A group of cranes in the water
Photo by Omkar Jadhav on Unsplash

A Snowflake Native Application is first and foremost an application that provides some functionality. Because it is hosted on the Snowflake Data Cloud, it is inevitable that it interacts with data in one way or another. My exploration into the Native App Framework was to find out how the application works with data, where it gets its data and how it stores the resulting data.

A use case that comes up frequently in data warehousing is that users want the possibility to enter data manually, for example, to be able to map data from one value to another for grouping or reporting purposes. In many environments, users don’t have solutions in place where they would be able to enter data manually and so they maintain such data in Excel files. These files are then manually uploaded to the data warehouse as part of the ETL process.

I thought this would be a good use case for building a Snowflake Native App: to enable users to edit simple mapping data through a Streamlit user interface and write the resulting data to a table in Snowflake. Thus the data could be maintained by multiple users in a single Snowflake account.

Getting started with the Snowflake Native Apps Framework

The Snowflake documentation explains the Native Apps Framework in detail. It also provides a step-by-step tutorial for developing, installing, testing, sharing, and using an application.

I also found a few helpful blogs, particularly this one that includes diagrams for better understanding of the underlying architecture. Another blog that explains how to access consumer’s data can be found here. And I also referred to this blog that helped me to understand how to implement an editable data frame in Streamlit.

I used the tutorial from the Snowflake documentation as a starting point for my application and I customized it according to what I needed for my use case.

Building the application

I started with a Snowflake worksheet where I created an application package named SIMPLE_MAPPING_APP_PCK, a schema named STAGE_CONTENT and an internal stage named SIMPLE_MAPPING_APP_STAGE:

USE ROLE accountadmin;
GRANT CREATE APPLICATION PACKAGE ON ACCOUNT TO ROLE accountadmin;

CREATE APPLICATION PACKAGE simple_mapping_app_pck;
USE simple_mapping_app_pck;

CREATE SCHEMA stage_content;
USE SCHEMA stage_content;

CREATE OR REPLACE STAGE simple_mapping_app_stage
FILE_FORMAT = (TYPE = 'csv' FIELD_DELIMITER = '|' SKIP_HEADER = 1);

Then I opened Visual Studio Code where I created the code files. Following the tutorial from the Snowflake documentation, I created four files named manifest.yml, README.md, setup.sql, and simple_mapping_st.py:

Application code files

The manifest.yml file is more or less the same as in the Snowflake tutorial. The README.md file contains information about the application and instructions to the consumer who will be executing it.

Let’s look at the setup.sql file:

CREATE APPLICATION ROLE APP_PUBLIC;
CREATE SCHEMA IF NOT EXISTS CORE;
GRANT USAGE ON SCHEMA CORE TO APPLICATION ROLE APP_PUBLIC;

-- create table that will store the mappings
CREATE TABLE IF NOT EXISTS CORE.SIMPLE_MAP (
from_code varchar,
to_code varchar
);
-- grant select privilege on the mapping table to the consumer
GRANT SELECT ON TABLE CORE.SIMPLE_MAP TO APPLICATION ROLE APP_PUBLIC;
-- insert some sample data
INSERT INTO CORE.SIMPLE_MAP VALUES('101', 'Group A');

CREATE STREAMLIT CORE.SIMPLE_MAPPING_ST
FROM '/streamlit'
MAIN_FILE = '/simple_mapping_st.py';
GRANT USAGE ON STREAMLIT CORE.SIMPLE_MAPPING_ST TO APPLICATION ROLE APP_PUBLIC;

This setup script is executed when the application is installed in the consumer account. It creates an application role named APP_PUBLIC and a schema named CORE. It grants usage on the schema to the APP_PUBLIC role.

Then it creates a table named SIMPLE_MAP which will store the mappings that the user will enter when using the application. To allow the consumer account to view this table, the SELECT privilege is granted to the APP_PUBLIC role. The application would work even if the grant was not executed, because the table is internal to the application and would be available to the application code. However, the consumer account would not be able to see the content of the table unless the grant is issued.

An initial row of data is inserted into the table, just so that the user has an initial data frame available when they first execute the application. Although this is not necessary and the user could start with a blank data frame, this would not be the typical scenario in a real situation. Most likely the initial codes would be populated in the FROM_CODE column and the TO_CODE column would be left blank for the user to update. The initial codes could be sourced from an existing table and included as provider data in the application.

Finally, a Streamlit application is created using the code in the simple_mapping_st.py file and usage granted to the APP_PUBLIC role.

Now let’s look at the simple_mapping_st.py file:

import streamlit as st
from snowflake.snowpark.context import get_active_session

st.title("Simple Mapping Editor")

# Get the current session context
session = get_active_session()

# Create a form
with st.form("mapping_form"):
st.caption("Add or update the mappings in the table below")
# Create a data frame from a table
df = session.sql('SELECT FROM_CODE, TO_CODE FROM SIMPLE_MAP')
df.collect()
# Convert it into a Pandas data frame
pdf = df.to_pandas()
edited_pdf = st.experimental_data_editor(pdf, use_container_width=True, num_rows="dynamic")
submit_button = st.form_submit_button("Submit")

if submit_button:
try:
session.write_pandas(edited_pdf, 'SIMPLE_MAP', overwrite = True)
session.sql('GRANT SELECT ON TABLE CORE.SIMPLE_MAP TO APPLICATION ROLE APP_PUBLIC').collect()
st.write('Data saved to table.')
except:
st.write('Error saving to table.')

This python code first retrieves the Snowflake session information so that it can issue commands to Snowflake. It then creates a Streamlit form. The form contains a data frame editor and a Submit button. The data frame editor is initially populated from the SIMPLE_MAP table that was created in the setup script.

The user can add and update data in the data frame editor and once they are done, they press the Submit button. The contents of the edited data frame are written back to the SIMPLE_MAP table using the overwrite = True option which means that the data in the table is overwritten with the new data.

This part of the code was not working initially, until I figured out (using the DEBUG mode and looking at Query History) that when a data frame is overwritten, the data is first written to a temporary table, then the target table is dropped and the temporary table renamed to the target table, causing the grant to the APP_PUBLIC role to disappear. For this reason, the grant must be issued again each time the data is written to the SIMPLE_MAP table.

Once all files were ready, I used the PUT command in SnowSQL to put the files into the SIMPLE_MAPPING_APP_STAGE internal Snowflake stage. Then I was able to create the application:

CREATE APPLICATION simple_mapping_app
FROM APPLICATION PACKAGE simple_mapping_app_pck
USING '@simple_mapping_app_pck.STAGE_CONTENT.simple_mapping_app_stage';

I tested the application in the same account where I developed it and after many retries, tweaks, and bug fixes, it was finally working! Here it is:

Simple Mapping Editor Streamlit App

The user can edit the data in the Streamlit data frame editor and save the data to a table in Snowflake. The data can then be accessed using a SQL SELECT statement:

SELECT FROM_CODE, TO_CODE
FROM SIMPLE_MAPPING_APP.CORE.SIMPLE_MAP;

Giving the following result:

Contents of the mapping table

In a real situation, this data could then be accessed by an ETL process and populated into the data warehouse. Of course this application is just a prototype and could use some tweaking before it is production ready, particularly in adding more functionality such as allowing a user to delete records (or mark as deleted), checking for uniqueness of codes so that users don’t enter duplicate data, possibly checking target codes against a preexisting table, and so on.

Recommendations for working with Native Apps

While developing my application, I found it most convenient to have four windows open:

  1. Visual Studio Code where I maintained all of my code files
  2. Command line window with SnowSQL running where I executed the PUT commands to the Snowflake internal stage (hopefully Snowflake Git integration will be available soon which should obliterate this step)
  3. A Snowsight window with a worksheet where I created the application package and the application as well as dropped and recreated the application each time I changed the code. I also used this window and additional worksheets where I debugged my application by viewing objects and Query History.
  4. Another Snowsight window where I tested the application by executing it each time I recreated it. It was useful to have this additional Snowsight window so that I didn’t have to jump from the application to the worksheets and back again within a single window.

Since a Native App has many moving parts and many possible points of failure, testing can be difficult. A very useful feature is the ability to run the application in debug mode. This can be activated using the following command:

ALTER APPLICATION simple_mapping_app SET DEBUG_MODE = TRUE;

When the application is executed in debug mode, it will show you all objects in the application, even those that would otherwise not be visible to the consumer. This is very helpful in troubleshooting grants on objects. Additionally, you will be able to see all the commands that were executed by the application in the Query History window (normally, an application hides these commands from the consumer), including any failed commands which is again helpful in troubleshooting your application’s code logic.

When developing my application I found it easier to not create any versions because this would just add complexity while I was still developing and debugging my code. Each time I changed my code, I just recreated the application. Once I had the application up and running, this was my initial version 1.0.

While testing the application, I executed it in the same account where I was developing. Just like I don’t deploy any untested code, I didn’t publish the application to the Marketplace to be shared with other accounts until I was sure it was completely done and working as expected.

--

--