Building a Real Estate App with Snowflake Native App Framework

A step-by-step guide on the what, why, and how

Photo by author at Grand Teton National Park

Snowflake announced in Snowflake Summit 2023 that Snowflake Native App Framework Now Available to Developers in AWS. Let’s explore Snowflake Native App framework in this article, and understand its what, why, and how.

Snowflake Native App Framework

Snowflake has revolutionized data management and analytics. Snowflake Data Cloud has created an ecosystem of businesses and organizations that can share and consume shared data and data services.

At the core of it, Snowflake Native App framework brings apps to data, making apps native to their data, and introducing an entirely new way to put data to work. Why is it such a brilliant idea to bring apps to data? Because we are opening up a whole new world of possibilities to enrich, activate, enhance, visualize, and transform data — without data ever leaving the consumer’s account.

The key benefits of this framework are:

  • Application providers do not have to manage infrastructure and worry about securing sensitive customer data.
  • Application consumers do not have to worry about sending sensitive data outside of their data cloud. They can discover and purchase Snowflake native app on Snowflake Marketplace then install and run them within their own Snowflake account.

This is a win-win for both application providers and consumers.

Real Estate App High Level Architecture

For our demo app, we are going to use the free Knoema Real Estate Data Atlas from Snowflake Marketplace as our source data. We will build a demo app to allow users to compare the average single-family residence values and average rent (for all homes plus multi-family) between two cities in the United States for the past 12 months.

From Snowflake Native App framework perspective, we can understand its building blocks in the following diagram (a triple-yolk egg :-):

Diagram by author
  • A native app contains an application package.
  • An application package contains multiple schemas (for shared data and for application files).
  • Application files are uploaded to a named stage, which resides in a schema. We have the option to add application objects such as UDFs and stored procedures, for our demo app, we will develop a Streamlit app instead to be included in the package.

Let’s lay out our demo app’s high level architecture, based on Snowflake Native App framework:

Diagram by author

Some key highlights from the above diagram:

  • We will be installing our source data from Snowflake Marketplace, real_estate_data_atlas database from Knoema.
  • We need to create an interim database real_estate_data to import the Marketplace data as we cannot share the shared data from Marketplace directly in our native app.
  • We need to create an application package real_estate_data_package to host the shared_data schema, containing the source data from Marketplace, and stage_content schema, which contains the application files and code.
  • We then publish our versioned native app to the private listing in Snowflake Marketplace. Public listing is an available option which we will not use for our demo app.
  • We log in from another Snowflake account as the consumer, install our app and run it. Notice the double headed arrow from Snowflake Marketplace to the consumer account, it indicates that consumer account can look for and install the app from the marketplace, or provider can push the app to the consumer account through the marketplace.

Let’s dive into the implementation details.

Real Estate App Implementation Details

Snowflake provides great step-by-step documentation on how to implement an app using Snowflake Native App framework. I studied it and summarized the steps I followed to develop our real estate app in the following seven steps.

Step 1: Lay out project structure

To adhere to the project structure standard set by Snowflake Native App framework, we start by adding the following files/folders to our demo app “real-estate-data-analysis”, as seen in the screenshot below:

  • scripts/setup.sql: SQL script that contains all setup details your consumer account(s) needs in order to run your app. It runs automatically when a consumer installs an app in their account.
  • streamlit: home for your Streamlit app file(s), in our demo app’s case, we added real_estate.py.
  • manifest.yml: contains basic configuration information about the app. For our demo app, we simply have the following in this file. For details of each field, refer to Snowflake documentation.
manifest_version: 1
artifacts:
setup_script: scripts/setup.sql
readme: readme.md
  • readme.md: provides a description of what your app does. Its content will be displayed in the Snowsight when you test your app, or in the private or public listing in Snowflake Marketplace depending on where you decide to publish your app. Basically this is the front door to your app, informing users what your app is about and what it does.

Check out the complete source code of this demo app in my GitHub repo.

Step 2: Create an application package

As depicted in our architectural diagram above, our application package real_estate_data_package contains two schemas: shared_data and stage_content (you can name your schemas for your app however you want). Understand an application package as an extension to a Snowflake database, it encompasses not just the shared data in shared_data schema, but also application files in stage_content schema.

Let’s start a new SQL worksheet in Snowsight, and create our application package and its schemas, also create a named stage where our application files will be uploaded to:

GRANT CREATE APPLICATION PACKAGE ON ACCOUNT TO ROLE accountadmin;

CREATE APPLICATION PACKAGE real_estate_data_package;

USE APPLICATION PACKAGE real_estate_data_package;

CREATE OR REPLACE SCHEMA stage_content;

USE SCHEMA stage_content;

CREATE OR REPLACE STAGE real_estate_data_package.stage_content.real_estate_data_stage
FILE_FORMAT = (TYPE = 'csv' FIELD_DELIMITER = '|' SKIP_HEADER = 1);

Step 3: Get our data ready

Let’s now go to Snowflake Marketplace, look for Knoema Real Estate Data Atlas, install it, and confirm that we have a new database real_estate_data_atlas added to our Snowflake account.

Studying the sample queries and the data structure, we can conclude that we only need data from two views, ZRHVI2020JUL and ZRRENTVI2020JUL, for our comparison logic. Since we cannot share the shared data from Snowflake Marketplace, let’s create a local database real_estate_data to hold data from those two views.

Let’s also create a shared_data schema, where we will store our data to be shared with the consumers. This shared_data schema will be included in the application package.

Let’s continue on our worksheet in Snowsight:

-- need to create an local db and views so they can be shared with real_estate_data_package
CREATE DATABASE IF NOT EXISTS REAL_ESTATE_DATA;
CREATE VIEW IF NOT EXISTS ZRHVI2020JUL AS SELECT * FROM REAL_ESTATE_DATA_ATLAS.REALESTATE.ZRHVI2020JUL;
CREATE VIEW IF NOT EXISTS ZRRENTVI2020JUL AS SELECT * FROM REAL_ESTATE_DATA_ATLAS.REALESTATE.ZRRENTVI2020JUL;

-- give the app package reference access to a local db
GRANT REFERENCE_USAGE ON DATABASE REAL_ESTATE_DATA TO SHARE IN APPLICATION PACKAGE real_estate_data_package;

-- set the context to the application package
USE APPLICATION PACKAGE real_estate_data_package;

-- create schema shared_data and create tables for the data
CREATE SCHEMA IF NOT EXISTS shared_data;
USE SCHEMA shared_data;
CREATE TABLE IF NOT EXISTS ZRHVI2020JUL AS SELECT * FROM REAL_ESTATE_DATA.public.ZRHVI2020JUL;
CREATE TABLE IF NOT EXISTS ZRRENTVI2020JUL AS SELECT * FROM REAL_ESTATE_DATA.public.ZRRENTVI2020JUL;

-- grant app package usage access to shared_data (must have)
GRANT USAGE ON SCHEMA real_estate_data_package.shared_data TO SHARE IN APPLICATION PACKAGE real_estate_data_package;

-- grant select on tables to share in application package
GRANT SELECT ON TABLE real_estate_data_package.shared_data.ZRHVI2020JUL TO SHARE IN APPLICATION PACKAGE real_estate_data_package;
GRANT SELECT ON TABLE real_estate_data_package.shared_data.ZRRENTVI2020JUL TO SHARE IN APPLICATION PACKAGE real_estate_data_package;

Step 4: Add application logic to setup.sql

setup.sql is run when the app is installed by our consumers. The framework has already defined some standard logic for this file, such as creating the application role, creating versioned schema to store public objects such as stored procedures, user-defined functions (UDF), and Streamlit app, etc. The setup.sql for our demo app looks like the following:

--  create an application role, similar to database roles, but they can only be used within the context of an application
CREATE APPLICATION ROLE IF NOT EXISTS APP_PUBLIC;

-- create a new versioned schema to be used for public objects
CREATE OR ALTER VERSIONED SCHEMA app_schema;
GRANT USAGE ON SCHEMA app_schema TO APPLICATION ROLE APP_PUBLIC;

-- use tables from the application package and create views in the app
CREATE OR REPLACE VIEW app_schema.ZRHVI2020JUL AS SELECT * FROM shared_data.ZRHVI2020JUL;
CREATE OR REPLACE VIEW app_schema.ZRRENTVI2020JUL AS SELECT * FROM shared_data.ZRRENTVI2020JUL;

-- grant select on the views to APP_PUBLIC role so the consumer can access them
GRANT SELECT ON VIEW app_schema.ZRHVI2020JUL TO APPLICATION ROLE APP_PUBLIC;
GRANT SELECT ON VIEW app_schema.ZRRENTVI2020JUL TO APPLICATION ROLE APP_PUBLIC;

Step 5: Implement Streamlit UI

Our app is about comparing the average single-family residence values and average rent (for all homes plus multi-family) between two cities in the United States for the past 12 months. For side by side comparison, let’s split our UI into two columns, with two text input fields to allow users to enter the two cities, and we then display the query results in two columns, one for housing value comparison, the other for rent comparison.

import streamlit as st
import altair as alt
from snowflake.snowpark.context import get_active_session

# use the full page instead of a narrow central column
st.set_page_config(layout="wide")

# title and instruction
st.title("Real Estate Data Analysis")
st.write(
"""Enter two cities (with two-letter states) in the United States to start, for example: "San Francisco, CA" and "New York, NY".
"""
)

# split UI into two columns
col1, col2 = st.columns(2)

# let user enter the two cities for comparison
city1 = col1.text_input("Enter the first city, such as 'San Francisco, CA':", value="San Francisco, CA", max_chars=30, key="city1")
city2 = col2.text_input("Enter the second city, such as 'New York, NY':", value="New York, NY", max_chars=30, key="city2")

if st.button("Compare"):
if city1 and city2:

# get the current credentials
session = get_active_session()

# query data for both housing and rental
df_housing = session.sql(f'SELECT TOP 24 "Region Name", "Date", "Value" FROM app_schema.ZRHVI2020JUL WHERE (LOWER("Region Name") = LOWER(\'{city1}\') OR LOWER("Region Name") = LOWER(\'{city2}\')) AND "Indicator Name" = \'ZHVI - Single Family Residence\' ORDER BY "Date" DESC, "Region Name" DESC;')
df_rental = session.sql(f'SELECT TOP 24 "Region Name", "Date", "Value" FROM app_schema.ZRRENTVI2020JUL WHERE (LOWER("Region Name") = LOWER(\'{city1}\') OR LOWER("Region Name") = LOWER(\'{city2}\')) AND "Indicator Name" = \'ZORI (Smoothed) : All Homes Plus MultiFamily\' ORDER BY "Date" DESC, "Region Name" DESC;')

# subheader for housing
col1.subheader("Average single-family residence values")

# execute the query and convert it into a Pandas data frame
queried_data_housing = df_housing.to_pandas()

# add bar chart
chart_housing = alt.Chart(queried_data_housing).mark_bar().encode(
x=alt.X('Date:T'),
y=alt.Y('Value:Q'),
color=alt.Color("Region Name:N")
)
col1.altair_chart(chart_housing, use_container_width=True)

# display the data frame.
col1.dataframe(queried_data_housing, use_container_width=True)


# subheader for rental
col2.subheader("Average rent (for all homes plus multiFamily)")

# execute the query and convert it into a Pandas data frame
queried_data_rental = df_rental.to_pandas()

# add line chart
chart_rental = alt.Chart(queried_data_rental).mark_line().encode(
x=alt.X('Date:T'),
y=alt.Y('Value:Q'),
color=alt.Color("Region Name:N")
)
col2.altair_chart(chart_rental, use_container_width=True)

# display the data frame.
col2.dataframe(queried_data_rental, use_container_width=True)

The more I use Streamlit, the more I love it! Such a neat low-code library, simply amazing. For example, to display two sets of data (San Francisco vs New York) on the bar chart for single family housing value comparison, all I need is these few lines below using altair.

# add bar chart
chart_housing = alt.Chart(queried_data_housing).mark_bar().encode(
x=alt.X('Date:T'),
y=alt.Y('Value:Q'),
color=alt.Color("Region Name:N")
)
col1.altair_chart(chart_housing, use_container_width=True)

Even though altair does usually do a good job by choosing the default data types for Pandas dataframe, we needed to tweak a little by revising the data types:

  • Date:T to specify data type temporal for date/time
  • Value:Q to specify data type quantitative for numerical quantity (real-valued).
  • Region Name:N to specify date type nominal for name/unordered categorical data.

Once code is complete, we need to add this Streamlit app to our setup.sql so our consumers can run our Streamlit app:

-- add Streamlit object
CREATE OR REPLACE STREAMLIT app_schema.streamlit
FROM '/streamlit'
MAIN_FILE = '/real_estate.py'
;

-- grant APP_PUBLIC role to access the Streamlit object
GRANT USAGE ON STREAMLIT app_schema.streamlit TO APPLICATION ROLE APP_PUBLIC;

Step 6: Launch our demo app

For our demo app, we only have a few files: real_estate.py, setup.sql, manifest.yml, readme.md. Let’s upload these files to the named stage real_estate_data_stage, which was created in step 2.

Now it’s time to launch our demo app! Add the following SQL statement to the worksheet we’ve been working on since step 2. These statements create our native app real_estate_data_app.

-- drop application if it already exists
-- DROP APPLICATION real_estate_data_app;

-- create application
CREATE APPLICATION real_estate_data_app
FROM APPLICATION PACKAGE real_estate_data_package
USING '@real_estate_data_package.stage_content.real_estate_data_stage';

If the above statement executes successfully, your Snowflake database structure for this demo app should look like the following:

Now let’s launch our app under “Apps” left navigation menu, you will then see REAL_ESTATE_DATA_APP listed under your installed apps section. Click on the app name, you will see the content from readme.md welcomes you. Click on the “Streamlit” top menu, you will be prompted to enter the two cities to compare. Enter the two cities in the format required, you get two charts and the source data populating the charts, all in two columns designed in our Streamlit code.

Interesting to know that the average single-family residence values for the last 12 months in San Francisco nearly double that in New York, while the average rent values in New York are slightly higher than that in San Francisco for the same time period.

This step may require multiple iterations of fixes and testing through the UI. Keep in mind that whenever you change any of the application files mentioned at the top of this step, repeat the following steps:

  • Upload the changed file(s) to the named stage either through Snowsight or using PUT command.
  • In your worksheet, run the following SQL to drop and re-create the native app:
-- drop application if it already exists
DROP APPLICATION real_estate_data_app;

-- create application
CREATE APPLICATION real_estate_data_app
FROM APPLICATION PACKAGE real_estate_data_package
USING '@real_estate_data_package.stage_content.real_estate_data_stage';
  • Launch the app and verify your changes.

Step 7: Publish and install the app

When you are fully satisfied with your app, you are now ready to publish it to Snowflake Marketplace. Let’s try private listings as this is a demo app only.

First, we need to add a version to our new app. It versions the application files that we uploaded to the named stage in step 6. Before you can create a listing for your application package, you must set a release directive. A release directive specifies which version of your application is available to the consumers.

Let’s run the following in our worksheets to add version v1 and set the release directive for our app.

-- add version to new app
ALTER APPLICATION PACKAGE real_estate_data_package
ADD VERSION V1
USING '@real_estate_data_package.stage_content.real_estate_data_stage';

-- set release directive
ALTER APPLICATION PACKAGE real_estate_data_package
SET DEFAULT RELEASE DIRECTIVE
VERSION = v1

Next, let’s publish our app by following the step-by-step instructions from this page.

Now log into Snowflake with your consumer account. I used a trial account, and I saw my newly installed app!

Click on “Get” button to install this demo app. Once installed, launch the Streamlit UI to test it out. This time, let’s try comparing Philadelphia, PA with Billings, MT.

It’s informative to know that the average single-family residence values for the last 12 months in Philadelphia are slightly lower than that in Billings, while the average rent values in Philadelphia are about $400 higher than that in Billings for the same time period.

Checking the data in the consumer account for REAL_ESTATE_APP, we see the following, only the versioned schema APP_SCHEMA resides in the app:

Great! We have just created our real estate Snowflake native app, published it to the consumer account, and the consumer account is able to successfully install and run the app.

A Few Key Observations

While implementing our demo app, I observed a few points which I would like to share with you.

  • Limitation of sharing the shared data from Snowflake Marketplace. As we experimented above, there is a limitation to share data installed from Snowflake Marketplace. For details, refer to General Limitations for Shared Databases.
  • Shared data refresh. Since Snowflake Marketplace data is periodically updated to include the latest data, for our demo app, how do we ensure we send the updated data to our consumer accounts without disrupting their usage of our app? Snowflake Native App framework handles the data updates beautifully through versioning management. We can periodically run our worksheet to update our local database real_estate_data to be updated with the latest data from real_estate_data_atlas database, and we then update our shared_data schema with the latest data from real_estate_data database, publish a new version of our app. New versions are automatically pushed to customers, they will always have the latest changes without additional downtime or disruptions.
  • Streamlit app in Snowflake Native App framework currently lacks support for certain third party libraries. I planned to experiment using LlamaIndex in Streamlit app within the Native App framework, unfortunately it didn’t work. I hope integration with third party libraries will be made easier and seamless from the native app framework in the near future.

Summary

We explored Snowflake Native App framework in this article. We took a close look at it, understood what it is, why it’s a great framework, and how to implement it. We dived into building a demo app for real estate data analysis, comparing the average single-family residence values and average rent between two cities in the United States for the past 12 months. We walked through seven steps in building, publishing, and installing a native app.

It’s a joy to see the app up and running without any infrastructure hassle and without our data ever leaving Snowflake. Bringing apps to data, the ultimate mission of Snowflake Native App framework, a truly impressive framework. I hope you find this article helpful.

For the complete source code of our demo app, check out my GitHub repo.

Happy coding!

References:

--

--