Connecting to Snowflake from Streamlit

Introduction

Streamlit is a terrific tool for Python developers to quickly create applications for data manipulation, exploration, and analysis. Of course, data-intensive applications need access to data. A powerful way for Streamlit to manipulate data is to leverage a database and data processing engine such as Snowflake. In order to most effectively use Streamlit and Snowflake together there are some best practices that developers should follow.

The first best practice is to leverage the data processing power of Snowflake to manipulate the data as much as possible before pulling it back into the application itself. Snowflake provides two powerful methods to do this from your Python application, the Snowflake Python Connector and the Snowflake Snowpark Python package. The Snowflake Python Connector allows the application to submit SQL (as complex as you’d like) to execute inside Snowflake. The Snowpark Python package has a DataFrame API that allows for manipulation of data inside Snowflake without having to write any SQL.

In both cases, these packages need to connect to Snowflake, and here we need to take some care in managing the connection to Snowflake in the Streamlit environment. The best practice in Streamlit for connecting to databases, as with other operations that may take a while to calculate but can be reused, is to cache the connections to improve performance and user experience.

However, there are a few different ways we can do this, and the choice will depend on the type of application and data access that we require.

In this blog post we will explore the best practices for two main application scenarios and how to leverage the caching mechanisms of Streamlit to efficiently connect to Snowflake and manage that connection for a good user experience.

Types of applications

Some applications are designed to let a user explore a hosted data set. These applications are built with a single database user that all visitors will use to access the data. Typically, on the database side, you would create a database user for the application and grant it appropriate access to the data and database functionality that is appropriate for the application. This allows you to restrict the access of the application while at the same time ensuring that all users have the same experience. For these types of applications, the application has access to the credentials (e.g., in a credentials file, environment variables, or the st.secrets mechanism) and can make the connection on start and reuse it from then on.

Other applications are designed for users that themselves have their own database credentials. This may be an application that connects to a particular Snowflake account, but each application visitor has their own user in that account. It could also be that the application is designed to connect to any Snowflake account, so the application visitor can supply not only their login credentials, but also their Snowflake account locator. For these types of applications, we would want to have a login form, and the application would not open any further until the user provided valid credentials and a Snowflake connection was established. We do not want to hang onto the credentials themselves any longer than it takes to establish the connection, for safety reasons.

One thing we need to be cognizant of is that the connection can be closed. This can happen programmatically, or if the connection experiences an error of some sort and the connection is severed. One other way the connection can be closed is if the connection is idle for a period of time, in which case the server will close the connection. The default setting is to close the connection after 4 hours of inactivity. We will want our users to have a good user experience even during these events.

In the case of a single connection, since we have access to the credentials, we can reestablish the connection. To do this, we do need to be able to test that the cached connection is still open, or if it has been closed. In the case of a login form, if we detect that the cached connection is closed, since we do not still have the credentials, we would return the user to the login screen so they can provide the necessary credentials to reconnect.

Applications with their own database user

As we said, this is one of the simpler scenarios as we do not need anything from the user in order to make a connection to Snowflake. Typically, database credentials are stored in one of a few places (though, this is not an exhaustive list):

  • In a credentials file that the application can read. Please make sure that this file is not checked into your source code repository.
  • In environment variables that the application can access. From a security perspective, familiarize yourself with how these environment variables may be exposed to logging, administrators, and others.
  • In the secrets.toml file to leverage Streamlit’s st.secrets mechanism. Again, make sure that this file is not checked into your source code repository.

However the application accesses the credentials, the next step is to use the Snowflake Python Connector or Snowflake Snowpark package to connect. We will want to encapsulate this connection in a function that we can decorate with the @st.experimental_singleton decorator to cache in the Streamlit global cache that all visitors will share. The following is an example using Snowpark, but the analogous approach can be taken with the Snowflake Connector.

# Connect using the Snowflake Snowpark package
@st.experimental_singleton
def snowflake_sesion(fname: str) -> Session:
snowcreds = json.load(open(fname, “r”))
return Session.builder.configs(snowcreds).create()
session = snowflake_session(“/path/to/credentials.json”)

The @st.experimental_singleton decorator will cache the result of the function so that calls that have the same input arguments will use the cached result. If the function is called with different arguments (e.g., a different credentials file name, in this case), the function will be called and that result will also be cached.

The problem we may run into here is what happens if the connection is closed or severed. In that case, the cached object will be closed and the application will error when the closed object is used. We need a way to call the function anyway and replace the cached value with a new connection.

The approach we will take is to encapsulate making the connection and checking that it is still valid in a class, and the function will return an instance of that object. Then when we call our snowflake_session() function, it will return the open connection or will reconnect if the cached connection is closed:

class _SnowflakeSessionWrapper:
def __init__(self):
self._connection = None
def get_connection(self, creds) -> Session:
if not self._validate_connection():
self._connection = self._create_connection(creds)
return self._connection
def _validate_connection(self) -> bool:
if self._connection is None:
return False
if self._connection._conn._conn.is_closed():
return False
return True
def _create_connection(self, creds) -> Session:
return Session.builder.configs(creds).create()
def snowflake_sesion(fname: str) -> Session:
@st.experimental_singleton
def get_connection(creds) -> _SnowflakeSessionWrapper:
return _SnowflakeSessionWrapper()

snowcreds = json.load(open(fname, "r"))
return get_connection(snowcreds).get_connection(snowcreds)

session = snowflake_session("/path/to/credentials.json")

Here the session is stored internally in the _SnowflakeSessionWrapper class, and the instance of that class is cached in Streamlit’s global cache. Again, if different credentials are provided, a new object is created and the session is cached internally to that object. In this way we can establish multiple Snowflake sessions, should the need arise.

Applications where users provide their own credentials

Unlike the previous scenario, we don’t have credentials to open the connection, so we will need to gather that information from the user. The pattern is that before the connection is established we will present only a login form for the user to fill out. If we need to show something even before the user logs in, we can do this, as well, but the main application will not render until the user logs in.

Also, unlike the previous scenario, we will want to cache the connection in the session cache, as opposed to the global cache. This cache is only available to the one web session, so users can only access their own sessions, which is good. We will use the st.session_state cache for this purpose.

There are a number of options we could prompt the user to enter, but for illustrative purposes we will just show capturing the main options: account, user, password, and warehouse.

STKEY = 'STSNOW_CONNECTION' ## Key to use in st.session_state## Callback function to create session
## gets credentials from session state
def session_callback():
creds = {}
creds["account"] = st.session_state["SNOWCREDS_account"]
creds["user"] = st.session_state["SNOWCREDS_user"]
creds["password"] = st.session_state["SNOWCREDS_password"]
creds["warehouse"] = st.session_state["SNOWCREDS_warehouse"]
## Delete the info from st.session_state
del st.session_state["SNOWCREDS_account"]
del st.session_state["SNOWCREDS_user"]
del st.session_state["SNOWCREDS_password"]
del st.session_state["SNOWCREDS_warehouse"]
session = Session.builder.configs(c).create()
st.session_state[self.STKEY] = session
return session
## Test if the session is open
def is_open(self, s: Session) -> bool:
return not s._conn._conn.is_closed()
# Close the session (for the logout button)
def close():
st.session_state[STKEY].close()
del st.session_state[STKEY]
## Get the session, or present the login form
def snowflake_login() -> Session:
if STKEY in st.session_state: ## Check if connection in cache
if is_open(st.session_state[STKEY]): ## check if connection is open
## If open, put logout button in sidebar and return the connection
st.sidebar.button("Disconnect from Snowflake", on_click=close)
return st.session_state[impl.STKEY]
## Login form
with st.form("Snowflake Credentials"):
st.subheader("Snowflake Credentials")
st.text_input("Account", key="SNOWCREDS_account")
st.text_input("User", key="SNOWCREDS_user")
st.text_input("Password", key="SNOWCREDS_password", type="password")
st.text_input("Warehouse", key="SNOWCREDS_warehouse")
st.form_submit_button("Connect", on_click=session_callback)
## We do not want anything beyond this form to be presented
## So, we call st.stop()
st.stop()
## Everything to this point will be shown
session = snowflake_login()
## Nothing down here will be shown until the user logs in

There is a bit going on here that we should walk through. The main function of interest is the snowflake_login() function. It checks to see if there is a session state cached with a special key (i.e., STKEY = 'STSNOW_CONNECTION'). If it is present, then we call the is_open() function to check if the session is still open. If it is, then we simply return this cached session. If the session is open, we also put a logout button in the sidebar. That button is wired to call the close() function, which closes the cached session.

If the session is not open or there is no cached session, the snowflake_login() function then presents a form for the user to fill in, and then calls st.stop() to not show any more of the application. The form elements are simple text inputs. We need to set the key attribute for these elements so that they are available to the session_callback() function. So, we set some key names that we can pick up in session_callback(). Finally, we add an st.form_submit_button which is wired to call the session_callback() function.

The session_callback() function is pretty straightforward. It gets the form entry information from st.session_state using the keys we used in the form, creates a credentials dictionary, and calls the Snowpark Session creation function. It then stores the resulting session in st.session_state using the special key, so that subsequent runs will pick up the cached session.

Now, all we need to do is call the snowflake_login() function to get the session. Anything above that call will be run if the user is logged in or is not. Anything below that call will not be run if the user is not logged in, but will once they are.

Now, if the session is closed (e.g., if the user is idle for several hours), then when the session is retrieved the is_open() function will return False and the user will be presented with the login form as if they had not logged in yet. This is the behavior we would like.

Conclusion

We have shown two different approaches to connect to Snowflake from Streamlit in ways that will ensure a good user experience, in terms of efficiency, security, and fault tolerance. These approaches support two main use cases for Streamlit applications that connect to Snowflake

  • For applications using a single credential set to connect to Snowflake for all users, a single global session should be created and cached in such a way that you can automatically reconnect in case the cached connection is closed or severed.
  • For applications using user-specific credentials, users should be prompted to enter their credentials and a session be established per user and cached in the session state for that user; in the event that the session is closed or severed, the application should prompt for the credentials information and reconnect.

In order to make these approaches even simpler to use, we have created a Python package that encapsulates these approaches into a simple and effective API for connections to data sources in general, and also includes an implementation for Snowflake: st_connection. It implements the approach we discussed here, but has additional features to support more customizations, etc. However, the goal of simplicity is still retained for developers.

The singleton use case for Snowpark is as simple as:

import json
import streamlit as st
import st_connection
import st_connection.snowflake
creds = json.load(open("/path/to/json/credentials.json, "r"))
session = st.connection.snowflake.singleton(creds)

The login form use case for Snowpark is equally simple. With no arguments, the form will prompt for account, user, and password:

import st_snow
import st_connection
import st_connection.snowflake
session = st.connection.snowflake.login()

If you want to customize the form with additional arguments or defaults, you can do so via input arguments. This will ask for account, user, password, and warehouse, with some default values for the form, will hard-code the account and warehouse, and will name the form “Snowflake Login”:

import st_connection
import st_connection.snowflake
session = st.connection.snowflake.login({'user': '', 'password': None, 'database': 'PROJECT_DB'}, {'account': 'XXX', 'warehouse': 'PROJECT_WH'}, 'Snowflake Login')

There are additional options and support for both Snowpark and the Snowflake Connector. Visit the GitHub repository for more information. Additionally, for more information about Streamlit and the caching mechanisms for Streamlit, see the documentation.

--

--

Brian Hess
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

I’ve been in the data and analytics space for over 25 years in a variety of roles. Essentially, I like doing stuff with data and making data work.