Data Engineering and Snowflake Series — Part 2 #App Development — NativeApp
Thinking of storyline for this blog was really interesting, it made me think of my first mobile phone Nokia 1100 back in my college days. I primarily used it for texting without seeing those physical keypads (most of the times when my professors were teaching Engineering :)) and Trust me, with less or no mistakes. But thinking about now, I feel am using my mobile just like breathing. I wake up by its alarm and I go to bed when the mobile gets into sleep mode. Everything within Mobile is App (like blood, it keeps running…) and that is where our story begins.
Long Long ago, the software industry was dominated by very few large companies without much competition. In July 2008, Apple introduced its App-store with 500 applications and in October 2008 Google introduced Play-Store with few dozen of apps. This is a break-through revolution as this paved the pathway for all Entrepreneurs/Developers to reach the whole world. With just a decade of this invention, as of 2021 Q1 Apple had 2.22 million apps and Google had 3.48 million apps. No wonder, 90% of our time spent on mobile phones are in some Apps :). According to Statista, mobile applications are predicted to generate 935.2 billion US dollars by 2023 compared to 808.7 billion dollars in 2022.
All these millions of Mobile apps falls under just 3 types at high level.
Native Mobile Apps: → are designed to be “native” to one platform, whether it’s Apple iOS, Google’s Android, or Windows Phone. The native platform can be advantageous because it tends to optimize the user experience. Because it was developed specifically for the platform, it can operate more quickly and intuitively.
Web Apps: → are responsive versions of websites that can work on any mobile device or OS because they’re delivered using a mobile browser.
Hybrid Mobile Apps: → are combinations of both native and web apps, but wrapped within a native app, giving it the ability to have its own icon or be downloaded from an app store.
Now with all this context, Snowflake is going with the most successful idea that revolutionised mobile world to the data world with Native Apps. Snowflake is providing a platform for every developers, business, entrepreneurs to deploy data apps. It is a breakthrough idea (Don’t you think), as the data world are trying to move the application as close as possible to your data irrespective of regions and cloud provider the consumer is, the application can be deployed and run across cloud and across region. This provides better data governance for the customer and also reduces lot of I/Os spent on moving data to and fro across platforms.
Some references on the above statement are “Moving Computation is Cheaper than Moving Data” (from the Apache Hadoop documentation). A good article on healthcare “Bringing Code to Data: Do Not Forget Governance”.
Hope you all know that we currently support data sharing where you can share data (basically monetising the data) to snowflake accounts and non-snowflake users. But can we do the same with functions or algorithms? We will find out more in this blog.
This is an extension of my previous blog on Unstructured data. Just to reiterate, we created a function to extract text from images in the part1 of the series. I would recommend everyone to read the part1 series as we will be re-using the functions created in my previous blog here.
What is Native App?
Before getting to know Native App, based on how applications that exists today within Snowflake, we observe the below usage patterns Connected apps or Traditional managed apps
Traditional Managed Apps:
Connected Apps:
Last year Snowflake introduced connected apps where the customer data is processed into the SAAS provider platform maintaining single source of truth of data (Unified Data Source).
Native Apps:
The Native Application Framework takes connected applications to the next level by allowing providers to bring their application code to their customers’ data.
The Native Application Framework makes it easy to build, sell, and deploy applications within the Data Cloud. You can build applications using Snowflake core functionalities such as UDFs and stored procedures, and even the Streamlit integration (currently in development), then sell them to customers across the Data Cloud via Snowflake Marketplace.
The framework also provides telemetry tools to make it easy to monitor and support your applications. What’s really powerful is that native apps are deployed in the customer’s account in a way that gives the customer control over their data, while still protecting the provider’s intellectual property.
It all started with Data Sharing via Data Marketplace and now Snowflake is going one level up to open up Sharing Functions/Procedures. There are lot of opportunities in this space, as most of the customers have built tools/algorithms for their business needs WHY NOT MONETISE THEM?? Capital one bank invested and built a tool for managing Snowflake platform, now they are trying to monetise it. Read here for more details. Similarly, we are going to share our Lambda function (developed in our part1 of the series) to a consumer (another demo account) and see how the Consumer can be benefited. Another thing to note here is that the consumer will have no clue on how the app/tool behaves, what packages we use or how we are scaling (more like Microservices). Various methods of payments will be possible, but we wont be covering those details here in this blog. Please read here for more details.
Common use-cases:
The use-cases of native apps are evolving, below are few that we see customers are using this feature for.
- Any business or ML/AI algorithms
- ETL connectors
- Data cleanrooms for secure multi-party data analysis
- Identity and transcoding solutions
- Sentiment analysis using NLTK libraries
- Standardised industry data models
- Data enrichment apps from Factset, Equifax, Experian
Text extraction from Images Native App:
For this part2 of the series, we will pick the first use-case of building Native application using our algorithm (Text Extract from Images). Below is the step by step process to create a function and share the same with another Snowflake Account. This is still in Private Preview (which means you can not try this yet, but can request for it for testing purposes in customer accounts) but will be in GA soon. For this demo, you need 2 snowflake accounts (one being a producer and another being a consumer).
The below diagram shows how the data flows, between producer and consumer accounts when the consumer invokes the API. We will see them in action below.
Provider Account Steps:
Run the below script in the Producer Account. It performs the below steps.
- Create a setup store proc that creates external function with API integration
- Create the store proc to call the external function
- Create an installer script
- Create a role and provide necessary grants for the proc
- Create a share object and provide the necessary grants
-- #### Creating Database and Schema for Storing the functions ####
create or replace database ocr_processing;
use database ocr_processing;
create or replace schema ocr_image_extract;
use schema ocr_image_extract;-- ############# SETUP #######################
CREATE or replace PROCEDURE ocr_processing.ocr_image_extract.setup(integration_name string)
RETURNS STRING
LANGUAGE JAVASCRIPT
EXECUTE AS OWNER
AS $$
var endpoint = "https://XXXXX.execute-api.us-east-1.amazonaws.com/dev/ocr";var create_ext_func_cmd = "create or replace external function ocr_text_extract.ext_ocr(imagelocation string)"
+ " returns variant"
+ " api_integration = " + INTEGRATION_NAME
+ " as '" + endpoint + "'";
snowflake.execute({ sqlText: create_ext_func_cmd });return "SETUP COMPLETED";
$$;-- #### PROCEDURE that calls external function ####
create or replace procedure ocr_processing.ocr_image_extract.extract_text_and_load(rel_path string)
returns variant
language javascript
as
$$
var call_ext_func_cmd = "select ocr_text_extract.ext_ocr(select concat('{\"image\": \"','"+REL_PATH+"','\"}'))";
var res = snowflake.execute({ sqlText: call_ext_func_cmd });
res.next();return res.getColumnValue(1);
$$
;-- ############# GRANTS #######################
CREATE or replace DATABASE ROLE shared_db_role;
GRANT USAGE ON DATABASE ocr_processing to DATABASE ROLE shared_db_role;
GRANT USAGE ON SCHEMA ocr_processing.ocr_image_extract TO DATABASE ROLE shared_db_role;
GRANT USAGE ON PROCEDURE ocr_processing.ocr_image_extract.setup(string) TO DATABASE ROLE shared_db_role;
GRANT USAGE ON PROCEDURE ocr_processing.ocr_image_extract.extract_text_and_load(string) TO DATABASE ROLE shared_db_role;-- ############# SETUP #######################
CREATE or replace PROCEDURE ocr_processing.ocr_image_extract.installer()
RETURNS STRING
LANGUAGE SQL
EXECUTE AS OWNER
AS $$
begin
CREATE SCHEMA ocr_text_extract;
GRANT DATABASE ROLE shared_db_role TO DATABASE ROLE APP_EXPORTER;
return 'installer script Done';
end;
$$;-- ############# SHARE CREATION #######################
CREATE or replace SHARE ocr_share installer = ocr_processing.ocr_image_extract.installer();
GRANT USAGE ON DATABASE ocr_processing TO SHARE ocr_share;
GRANT USAGE ON SCHEMA ocr_processing.ocr_image_extract TO SHARE ocr_share;
GRANT DATABASE ROLE shared_db_role TO SHARE ocr_share;
GRANT USAGE ON PROCEDURE ocr_processing.ocr_image_extract.installer() TO SHARE ocr_share;ALTER SHARE ocr_share ADD ACCOUNTS=XXXX; -- ** Consumer Account Name
So with this you have created an Image extraction function and created a share object within Snowflake.
Consumer Account steps:
Next step is to use this share object in Consumer account and use it for processing. Below are steps performed.
- Create storage integration object for accessing S3 where images are stored at consumer end. (similar to our Storage steps above)
- Create an external stage (similar to our Storage steps above)
- Create an API Integration Object
- Create a role and provide necessary grants
- List share and create an database pointing to the share created in the provider account. This will install all the functions that was created in provider account in the consumer account.
- Please note that, you can only see the parameters passed to the functions but not the body of it (this is called secure functions in Snowflake world ensuring the Consumers are not able to see the code/algorigthms of producer)
- List your stage object and test the function by passing pre-signed URL from consumer stage. You will notice the lambda is invoked from Producer AWS account and the response is returned to the Consumer account. (Obviously, you can create a Stream, PROC and Task similar to PROCESSING step above to automate it in prod)
-- #### STORAGE INTEGRATION CREATION #### --
use database native_app_streamlit;CREATE OR REPLACE STORAGE INTEGRATION storage_integ -- ********* This is establishing the connection between AWS S3 with Snowflake user
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::XXXXX:role/s3fullaccess_role'
STORAGE_ALLOWED_LOCATIONS = ('s3://snowpark-demo-python-naveen/', 's3://snowpark-demo-python-naveen/nativeapp');DESC INTEGRATION storage_integ;-- #### STAGE CREATION #### --
CREATE OR REPLACE STAGE s3_stage_native_app
directory = (enable = true) -- ********* Enabling directory creates directory table for unstructured data
storage_integration = storage_integ
url = 's3://snowpark-demo-unstruc-naveen/nativeapp';
list @s3_stage_native_app;-- #### API INTEGRATION CREATION ############################## --
CREATE OR REPLACE api integration native_app_ocr_int
api_provider = aws_api_gateway
api_aws_role_arn = 'arn:aws:iam::XXXXXX:role/apifullaccess'
api_allowed_prefixes = ('https://XXXXX.execute-api.us-east-1.amazonaws.com/dev')
enabled = true;DESCRIBE integration native_app_ocr_int;-- #### NATIVE APP SCRIPT #### --CREATE or replace ROLE ocr_app_role;
GRANT USAGE ON INTEGRATION native_app_ocr_int to ROLE ocr_app_role;
GRANT USAGE ON DATABASE native_app_streamlit to ROLE ocr_app_role;
GRANT USAGE ON SCHEMA native_app_streamlit.PUBLIC to ROLE ocr_app_role;
GRANT SELECT ON STREAM native_app_streamlit.PUBLIC.s3_stream to ROLE ocr_app_role;-- #### Create Database from Shares (installing the functions locally)
SHOW SHARES;
CREATE OR REPLACE DATABASE CONSUMER_ocr_processing FROM share "XXXX"."OCR_SHARE";
GRANT ROLE ocr_app_role TO DATABASE CONSUMER_ocr_processing;
CALL CONSUMER_ocr_processing.ocr_image_extract.setup('native_app_ocr_int'); -- Installing (Creating the External Function with API Integration)-- ## Listing S3 Stage and call the image extraction functionlist @s3_stage_native_app;SELECT GET_PRESIGNED_URL(@s3_stage_native_app, 'snowflake.jpg', 3600); -- Calling the External function at the Producer end to retrieve the data contentcall consumer_ocr_processing.ocr_image_extract.extract_text_and_load('pre_signed_url from previous statement');
Summary:
Isn’t it great! you can create functions and share them with other snowflake accounts. You can list them via Snowflake marketplace and can generate new revenue stream for your Business. This is where the world is moving towards, data sharing and app sharing. This supports better data governance principle and simplified architecture. With Streamlit acquisition, Snowflake clearly is planning to have a deployment environment where you can host your data application in Snowflake and make it available in Snowflake app store. Feel free to read more on this topic here.
Happy to be part of it! Stay tuned for more!
Opinions expressed in this post are solely my own and do not represent the views or opinions of any of my employer.