Creating a Marketing Segmentation App with Streamlit & Snowpark

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

8 min readMar 30, 2023

Traditionally, applications have been supported by transactional databases. Even so, many applications don’t write a lot of data, they simply present the data in a particular way.

Nevertheless, they often process millions of rows of data. This new usage pattern has given rise to a new paradigm: DWH-native apps. They run on top of a cloud data warehouse, and only require a couple of clicks to set up and configure.

While this paradigm is still in its infancy, many vendors — like Census — have already jumped into this market, and there’s a whole ecosystem of best-of-breed tools that they are playing very nicely together with. But instead of onboarding another SaaS tool, Snowflake offers its customers the opportunity to build their own DWH-native app using Streamlit and Snowpark.

In this article, you’ll learn how to create a customer segmentation tool that can sync its selection back to the data warehouse.

The customer segmentation tool

You can build an extremely simple application with just three elements:

A slider for filtering clients based on their income
A button that returns the number of customers in the segment when clicked
A button that saves the segment as a view into Snowflake

The application and the example code in the article are kept very simple, but can easily be extended to multiple customer dimensions, with a variety of Streamlit’s input widgets.

The interface of our segmentation tool in Streamlit

Requirements

On top of Snowflake and Streamlit, you need a GitHub account for deploying your app. You’ll also need a Python development environment and some data.

Snowflake

Setting up a Snowflake account is easy and free for the first 30 days. Just make sure to set up a warehouse where you can host your marketing data, and a user with the necessary role and privileges to read from and write to it.

Streamlit

To deploy your application, you’ll need a PaaS platform account (like Heroku or Google App Engine). Luckily, Streamlit also offers Streamlit Community Cloud, making it super easy for beginners since it only requires a single click to deploy a Streamlit application from a GitHub repository.

GitHub

The easiest way to deploy Streamlit applications is by storing your application’s code in a GitHub repository. Make sure to set up an account and create a repository that you’ll clone in your development environment (if you’re not sure how to do this, here’s a tutorial). Be sure to make it private, since you’ll store Snowflake credentials in it.

Python 3.8 & Packages

While you probably like your own Python development environment, the Snowpark package is fairly strict with regard to its requirements, so it’s recommended to create a new Anaconda environment.

The following commands will:

Create a new Anaconda environment
Activate the environment
Install snowpark and pandas with conda
Install Streamlit with pip (it’s more stable than the conda version)

conda create --name snowpark -c https://repo.anaconda.com/pkgs/snowflake python=3.8
conda activate snowpark
conda install -c https://repo.anaconda.com/pkgs/snowflake snowflake-snowpark-python pandas
pip install streamlit

If you are unfamiliar with Anaconda and would like to integrate it into Visual Studio Code, you can follow this guide.

Data

To make a segmentation tool, you’ll need some data which can be used for segmenting your customers. For this tutorial, we’re using the customer analysis data from Kaggle. For the sake of following along, you can download the CSV file and upload it to your Snowflake warehouse with SnowSQL. And if you’re a Windows user, you might also need this guide.

Build a customer segmentation app with Streamlit

Before you start developing, create a _.py_ file named ‘mkt-app.py’. Open it, and insert the following chunks of code, starting with importing the required packages.

# Import dependencies
from snowflake.snowpark.session import Session
from snowflake.snowpark.functions import col, min, max
import pandas
import streamlit as st
import json
import math

Next, configure the global settings of your Streamlit application: Its title, favicon, and the layout of the app.

# Configure page
st.set_page_config(
    page_title="Segmentation Tool",
    page_icon="🪄",
    layout="wide"
)

Throughout the application, you’ll use Snowpark (more on that later), so you should create a session that can be used to interface with Snowflake. First, store your credentials in the pwd.json file, in the same folder as your Streamlit app.

{
    "account": "<YOUR_ACCOUNT>",
    "user": "<YOUR_USERNAME>",
    "password": "<YOUR_PASSWORD>",
    "role": "<YOUR_ROLE>",
    "warehouse": "<YOUR_WAREHOUSE>"
 }

Then go back to your application’s code and add the following snippet. It contains a function to create a Snowpark session object with the credentials from the json file.

# Create Session object
def create_session_object():
   connection_parameters = json.load(open('pwd.json'))
   session = Session.builder.configs(connection_parameters).create()
   return session

You are now ready to add the custom code which connects the interface to Snowflake via Snowpark. 🙌

The inner workings

When you make a Streamlit app on top of a data warehouse (as is the case in this article), there are two possible approaches:

1. You take the data into the application’s memory and process it with Pandas OR
2. You process the data in the data warehouse. And when you’re using Snowflake, this requires Snowpark.

If your data volume is small (with a maximum of hundreds of thousands of rows), consider using Pandas as a more cost-effective option. When you go that route, Snowflake will not bill you for each transformation since it happens in the memory of the Streamlit app, not in the data warehouse. But if you’re working with millions (or billions) of rows, you should definitely go for Snowpark. It will be a lot faster — and you won’t run into the memory limits of your app.

Although the example data user in this article only contains a couple of thousand rows, we’re using Snowpark for demonstration purposes.

Calculate minimum and maximum income

The first function returns the minimum and maximum income of all customers in our database. This range is used for setting the default values of the slider widget.

def get_aggregations():
   '''
   Returns the minimum and maximum values for the INCOME column.
   Can easily be extended to other columns.
   
      Returns:
         aggs (dict): Dictionary with the generated aggregations.
   '''
   sdf = (
      snow
         .table('public.customer_database')
         .agg(
            min(col('INCOME')).alias('income_min'),
            max(col('INCOME')).alias('income_max')
         )
   )
   pdf = sdf.to_pandas()
   income_min = int(pdf['INCOME_MIN'][0] - 1)
   income_max = int(pdf['INCOME_MAX'][0] + 1)


   aggs = {'income_min': income_min, 'income_max': income_max}
   return aggs

Calculate segment size

The next function calculates the size of the segment when customers are filtered on a minimum and maximum income. In the larger scheme of things, this will be used as a callback function when the user clicks the button to calculate the segment size.

def get_count(income_min, income_max):
   '''
   Returns the number of customers in the selected segment.
   Can easily be extended to other columns.


      Returns:
         count_segment (int): n customers
   '''
   segment_count = (
      snow
         .table('public.customer_database')
         .filter(
            col('income').between(income_min, income_max)
         )
         .count()
   )
  
   Return segment_count

Store segment in a Snowflake view

The final function of our application stores the income-filtered segment as a Snowflake view. This function is used as a callback from the button to store the segment.

def store_segment(income_min, income_max):
   '''
   Stores the selected segment in the public.sync_segment view.
   Can easily be extended to other columns.
   '''
   sdf = (
      snow
         .table('public.customer_database')
         .filter(
            col('income').between(income_min, income_max)
         )
   )
   sdf.createOrReplaceView('public.sync_segment')

The interface

The following lines of code will set out the interface and link them to the functions we created in the previous sections.

First things first

First, you’ll create a Snowpark session object (snow), set the database you’ll use, and calculate the minimum and maximum that will define the default values of the income slider. You’ll also calculate the _income_step_ variable’s value — which is used as the interval for the slider — later on.

Finally, set the title and the header of the application.

snow = create_session_object()
snow.sql('USE DATABASE example_data').collect()
aggs = get_aggregations()


income_step = int(math.floor(math.log10((aggs['income_max'] - aggs['income_min']))))


st.title('Segmentation tool')


# Filters
st.header('Filters')

The form

While you could create the application in such a way that it responds in real time to the slider input, you don’t necessarily want that. Since you’re using Snowpark, you want to minimize the number of operations on the Snowflake data warehouse. For this reason, it’s better to use a form, which will trigger the calculations only after the submit button is clicked. In the code below, a form with two elements is created: The slider and the submit button.

with st.form(key = 'filter_form'):
   filt_income_min, filt_income_max = st.slider(
      label = 'Minimum income',
      min_value = aggs['income_min'],
      max_value = aggs['income_max'],
      value = (aggs['income_min'], aggs['income_max']),
      step = income_step
      )


   st.form_submit_button(
      label = 'Preview segment',
      on_click = get_count,
      kwargs = {
         'income_min': filt_income_min,
         'income_max': filt_income_max
      }
   )

The segment size

The following code calculates the segment size when the application loads (basically, the size of the unfiltered customer database). It also outputs this count as a text element. Every time the form submit button is pressed, this text element will be updated with the new segment size.

count_segment = get_count(filt_income_min, filt_income_max)
st.text(f'Customers in segment: {count_segment}.')

The sync button

Finally, when the end-user is happy with the size of their segment, they can choose to sync it back to Snowflake. For this, a button is added to the Streamlit application, which will call the store_segment function, defined earlier.

st.button(
   label = 'Store segment',
   on_click = store_segment,
   kwargs = {
      'income_min': filt_income_min,
      'income_max': filt_income_max
   }
)

This brings you to the final part of this article: Deploying your application to Streamlit Community Cloud.

First, run the following command in the terminal of your development environment. It will create a requirements.txt file, which defines the dependencies of your Streamlit app.

pipreqs

As explained earlier, Snowpark is very picky with its dependencies. So you should also add pyarrow==10.0.1 to the file; otherwise, your app will be deployed with the wrong pyarrow version. Now, your requirements file should look like this:

pandas==1.4.4
snowflake==0.0.3
snowflake_snowpark_python==1.1.0
streamlit==1.19.0
pyarrow==10.0.1

Next, commit your changes, and push them to the GitHub repository. If you’ve never done this before, follow this tutorial.

Once your changes are pushed, you should head over to Streamlit Community Cloud and press New app. Select the repository for this app, the correct branch (if not main) and specify the file that contains the app.

Make sure that you don’t forget to change the version of Python to 3.8 under the Advanced Settings.

Finally, try your app and sync a segment back to Snowflake. You can now query this segment in a Snowflake worksheet and you can use this view for other purposes, such as syncing it to your marketing and sales tools with Census.

Bringing it full-circle

Throughout this article, you learned to build your own customer segmentation application using Streamlit and Snowpark. Although this tutorial was fairly simple, it can easily be extended to more complex uses with the help of various other customer segment filtering widgets. This use case is especially relevant in small and medium-sized organizations in which data professionals want to provide their less-tech-savvy colleagues with the right tools to build their own marketing segments.

But building segments is one thing; operationalizing them for your business teams is another. So, if you want to make your Snowflake data self-service for business teams, invest in a best-in-breed data activation platform built for marketers.

Learn how Census can help you activate your existing customer data. Book a demo with a Census product specialist to see how you can build granular audiences and sync customer data to all your marketing and advertising tools without code.