My First Native Application on Snowflake

Brad Culberson
9 min readJul 20, 2022

--

Native Applications on Snowflake were announced at Summit 2022. This is a new way of building and delivering data applications. Luckily I’ve been able to get my hands on this new capability and will share my thoughts and an example application. This article is based on the capabilities and implementation as of publication. This feature is still in preview and is likely to change.

Native Applications are built on Snowflake’s powerful data sharing capabilities which allows a party (producer) to curate and share data to another party (consumer) directly through Snowflake. After accepting the share, the data shared is available to use immediately and is always up-to-date without any need to build pipelines. The Snowflake Marketplace offers many public data shares, and there is also a robust private data sharing ecosystem built on Snowflake.

While data is a fundamental part of data applications, there is also advanced code and logic that often runs on that data to provide value. Native Applications combine the power of data sharing with the ability to install code and use local data to meet the needs of data applications. These applications can be shared privately or via the Snowflake Marketplace. They are installed directly into the consumer’s account; all operational burden and costs are on the consumer. The application runs inside the consumer account; the data that is shared to the producer’s application is controlled by the consumer.

What new capabilities does Snowflake have that allow builders to create Native Applications?

  • Installer scripts are now written and run during installation into each account. These installer scripts can create local objects that are needed by the application. While data sharing only allows for read-only objects from the producer, these local objects created by the application can be read-write and are stored in the consumer’s account. The script can also set the access controls on what objects the customers can see and use in the application.
  • There is a new APP_EXPORTER role which is used to grant what the producer wants to be visible to the consumer. By default everything is hidden.

What existing capabilities do we expect providers to leverage when building Native Applications?

  • Data Sharing: Data applications often need to be provided some data and often that data is a core benefit of the application. Providers may also request data shares back to their account for monitoring and integration.
  • Streams and Tasks: Data applications often need access to compute and will be running some async pipelines. Consumer’s warehouses are used when application code is executed from an existing session. Asynchronous work that needs to be completed on behalf of the application will leverage tasks.
  • Database Roles: Roles which are stored and shared with the database allow for easier management of access. The application provider can create multiple roles which the consumer can then grant to different roles in their account. A simple use case would be one in which some users need read access to data and others need read-write access to data in the application. The provider could create 2 roles in the database and set appropriate permissions on the objects such that the consumer can just grant those database roles to the correct roles in their account.
  • Data Access Control: Access to data can be easily controlled by both the consumer and the producer. This includes all data which is shared by the producer, provided by the consumer to the application, or data which is generated by the application. Consumers grant access, when needed, to the Native Application so it can access existing database objects. Native Applications can also create local objects. The Native Application can also grant access, when needed, to database roles and the APP_EXPORTER role which controls the visibility of those objects to the consumer. The robust access controls allow both the producer and the consumer to protect their data and intellectual property.

After getting access to all these new capabilities, I started thinking about a simple use case that I could implement using a Native Application. Internally we are using Native Apps to solve a lot of clean room solutions, but those are pretty complicated. Native Apps are well equipped to be the foundation for clean rooms because they can restrict who has access to what data and allow the code to be deployed that is trusted to join the data sources across parties.

I wanted a simple application to get started and thought of a data enrichment use-case which I’ve seen many customers implement. There are logs either for/from access logs, SIEM solutions, or clickstream data for almost every user of Snowflake. Often it is of interest to enrich these datasets with the country or timezone based on the IP address. This application will share the data needed to provide location information for IP addresses as well a helper function to get location information by IP and a stored procedure which will enrich in place.

To start with, I found the lite, free dataset from https://www.ip2location.com/. They offer a CSV which is really easy to work with that I could use in the Native App. Their documentation and a review of the CSV gave me the table structure I would want. These are the steps I used to create the data to share from my provider account:

CREATE DATABASE IP2LOCATION;CREATE SCHEMA IP2LOCATION;CREATE TABLE LITEDB11 (ip_from INT,ip_to INT,country_code char(2),country_name varchar(64),region_name varchar(128),city_name varchar(128),latitude DOUBLE,longitude DOUBLE,zip_code varchar(30),time_zone varchar(8));CREATE OR REPLACE FILE FORMAT CSVSKIP_HEADER = 1FIELD_OPTIONALLY_ENCLOSED_BY = ‘\”’COMPRESSION = AUTO;CREATE STAGE DATA_STAGEfile_format = CSV;

After downloading the lite data csv, that was PUT to Snowflake using SnowSQL CLI.

PUT file:///tmp/IP2LOCATION-LITE-DB11.CSV @DATA_STAGE;

To import the data into the table I copied from stage.

COPY INTO LITEDB11 FROM @DATA_STAGE;

My goal was to hide the complexity of using this data. I wrote a user defined function the consumer could use to enrich their data in their existing/new pipelines. I also added a stored procedure to the share which would allow the user to easily enrich in place. By putting this data and the stored procedure and user defined function in the share, I could easily update the data monthly when ip2location updates their dataset and if I need to fix any of the code, it can just be updated in my account and all users will be updated automatically.

To use this data, I would need the ability to map an IP address to a number as well as query to find the location record based on the IP address. I first added a function to map the IP to a number as a javascript function. This could have also been a sql or java function as it’s doing pretty basic math and parsing of ip. I made this a regular function because it will be invisible to the user and is only used by the ip2data function.

CREATE OR REPLACE FUNCTION ip2long(ip_address varchar(16))RETURNS stringLANGUAGE JAVASCRIPTAS$$var result = “”;var parts = [];if (IP_ADDRESS.match(/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/)) {parts = IP_ADDRESS.split(‘.’);result = (parts[0] * 16777216 +(parts[1] * 65536) +(parts[2] * 256) +(parts[3] * 1));}return result;$$;

For the user defined function that returns the location by IP, I would want that available in the share so made this a SECURE function. Using a secure function allows me to share the function with the consumer but hides all implementation details.

CREATE OR REPLACE SECURE FUNCTION ip2data(ip_address varchar(16))returns objectas$$select object_construct(‘country_code’, COUNTRY_CODE, ‘country_name’, COUNTRY_NAME,‘region_name’, REGION_NAME, ‘city_name’, CITY_NAME,‘latitude’, LATITUDE, ‘longitude’, LONGITUDE,‘zip_code’, ZIP_CODE, ‘time_zome’, TIME_ZONE)from litedb11 where ip_from <= ip2long(ip_address)::int AND ip_to >= ip2long(ip_address)::int$$;

At this point, I could run and test the solution directly in the provider account.

SELECT ip2data(‘8.8.8.8’);Query Results:{“city_name”: “Mountain View”,“country_code”: “US”,“country_name”: “United States of America”,“latitude”: 37.405992,“longitude”: -122.078515,“region_name”: “California”,“time_zome”: “-07:00”,“zip_code”: “94043”}

I also wanted a really fast and easy way for a user of this application to enrich data in place. I created a stored procedure that would update user data.

CREATE OR REPLACE SECURE PROCEDURE enrich_data(db_name varchar, schema_name varchar, table_name varchar, field_name varchar, dest_name varchar, incremental boolean)returns TABLE ()language sqlEXECUTE AS OWNERas$$declareupdate_statement varchar;res resultset;beginupdate_statement := ‘UPDATE ‘ || db_name || ‘.’ || schema_name || ‘.’ || table_name || ‘ set ‘ || dest_name || ‘ = ip2data(‘ || field_name ||’)’;if (incremental) thenupdate_statement := update_statement || ‘ where ‘ || dest_name || ‘ IS NULL’;end if;res := (execute immediate :update_statement);return table(res);end$$;

To test this in the provider account I created another database, schema, and table with some samples to enrich. This sql would be useful later when testing as a consumer.

create database TEST;create schema TEST;create or replace table test_data (ip varchar(16), ipdata variant);insert into test_data values (‘8.8.8.8’, null), (‘73.153.199.206’, null);call enrich_data(‘test’, ‘test’, ‘test_data’, ‘IP’, ‘ipdata’, FALSE);select * from test_data;

The procedure worked and both records were enriched with location data!

The next steps will package up this code and data into a Native Application. During installation, I used a database role and granted that role to the APP_EXPORTER role so that objects could be shared to the consumer.

CREATE DATABASE ROLE IP2LOCATION.shared_db_role;CREATE OR REPLACE PROCEDURE IP2LOCATION.IP2LOCATION.installer()RETURNS STRINGLANGUAGE SQLEXECUTE AS OWNERAS $$beginGRANT DATABASE ROLE shared_db_role to DATABASE ROLE APP_EXPORTER;return ‘installer script Done’;end;$$;

The Native Application is then created as a Share with an installer.

CREATE OR REPLACE SHARE ip2location_share installer = IP2LOCATION.IP2LOCATION.installer();

To secure the data, a new role hidden_db_role was created that has access to the shared data. This is a role the consumer cannot access or grant. The shared_db_role would be granted to the share and app_exporter and will provide the function that is the API for consumers to use.

GRANT USAGE ON DATABASE IP2LOCATION TO DATABASE ROLE IP2LOCATION.shared_db_role;GRANT USAGE ON SCHEMA IP2LOCATION.IP2LOCATION TO DATABASE ROLE IP2LOCATION.shared_db_role;GRANT USAGE ON DATABASE IP2LOCATION TO SHARE ip2location_share;GRANT USAGE ON SCHEMA IP2LOCATION TO SHARE ip2location_share;GRANT USAGE ON PROCEDURE IP2LOCATION.IP2LOCATION.installer() TO SHARE ip2location_share;GRANT USAGE ON PROCEDURE IP2LOCATION.IP2LOCATION.enrich_data(varchar, varchar, varchar, varchar, varchar, boolean) TO SHARE ip2location_share;GRANT DATABASE ROLE IP2LOCATION.shared_db_role TO SHARE ip2location_share;GRANT ALL privileges on function IP2LOCATION.IP2LOCATION.ip2data(varchar)TO SHARE ip2location_share;GRANT ALL privileges on function IP2LOCATION.IP2LOCATION.ip2data(varchar)TO DATABASE ROLE IP2LOCATION.shared_db_role;GRANT USAGE ON PROCEDURE IP2LOCATION.IP2LOCATION.enrich_data(varchar, varchar, varchar, varchar, varchar, boolean) TO DATABASE ROLE IP2LOCATION.shared_db_role;CREATE DATABASE ROLE IP2LOCATION.hidden_db_role;GRANT SELECT on TABLE IP2LOCATION.IP2LOCATION.LITEDB11 TO DATABASE ROLE IP2LOCATION.hidden_db_role;GRANT USAGE ON DATABASE IP2LOCATION TO DATABASE ROLE IP2LOCATION.hidden_db_role;GRANT USAGE ON SCHEMA IP2LOCATION TO DATABASE ROLE IP2LOCATION.hidden_db_role;GRANT DATABASE ROLE IP2LOCATION.hidden_db_role to SHARE ip2location_share;

After this, the Native Application is complete. To share this for testing, I added the consumer account I wanted to test with directly to the share.

ALTER SHARE ip2location_share ADD ACCOUNTS = <redacted>;

I then accepted the share in the consumer account.

CREATE DATABASE IP2LOCATION FROM SHARE <redacted>.ip2location_share;

And ran the following query to test the application:

SELECT ip2data(‘8.8.8.8’);

Query Results:

{“city_name”: “Mountain View”,“country_code”: “US”,“country_name”: “United States of America”,“latitude”: 37.405992,“longitude”: -122.078515,“region_name”: “California”,“time_zome”: “-07:00”,“zip_code”: “94043”}

When looking in the consumer account the only objects visible in the shared database is the ip2data function and the enrich_data stored procedure.

As a final test to the application, a test dataset was created in the consumer account to be enriched in place. Access had to be granted to the application so the procedure could read and update in place.

create database TEST;create schema TEST;create or replace table test_data (ip varchar(16), ipdata variant);insert into test_data values (‘8.8.8.8’, null), (‘73.153.199.206’, null);select IP2LOCATION.IP2LOCATION.ip2data(ip) as i from test_data;create role loc_app_role;grant select on table test_data to role loc_app_role;grant update on table test_data to role loc_app_role;grant usage on database test to role loc_app_role;grant usage on schema test to role loc_app_role;grant role loc_app_role to database ip2location;call IP2LOCATION.IP2LOCATION.enrich_data(‘test’, ‘test’, ‘test_data’, ‘IP’, ‘ipdata’, TRUE);select * from test_data;

The procedure worked and both records were enriched with location data!

This Native Application is just a really simple example of what could be built using the new Snowflake capabilities. It packages up some data and code which can be easily shared to other accounts directly or could be published in the Marketplace for others to use.

What I really love about Native Applications is how the providers can focus on the code and provide data that is useful for those applications but leave the operational complexity and data security to Snowflake. I don’t need to worry about how that data is shared with the customer or how to deploy my code. It’s all automatic with Native Applications.

Native Applications are in private preview. If you are a customer that wants to build a Native Application, reach out to your account team to request access. I can’t wait to see what you build!

--

--

Brad Culberson

Data Platform Architect at Snowflake specializing in Data Applications. Designed & built many data applications in Marketing and Health & Fitness.