Using ClearScape Analytics™ to Understand Online Customer Behavior
In a post-pandemic world characterized by economic uncertainty and inflation, understanding customer behavior has become even more critical for online retailers. The impact of inflation on consumer behavior cannot be overlooked, with a striking 43% of US consumers expressing concerns about its effect on everyday expenses and savings. As a result, consumers are more cautious about how they spend their money, the products they buy, and the subscriptions and memberships they keep. Despite the challenges and changes in consumer behavior, data scientists can leverage Teradata’s machine learning and advanced analytics tools to help businesses build and adjust their strategies for business optimization and customer retention.
In this blog post, we will be exploring Teradata’s ClearScape Analytics™ functions via ClearScape Analytics Experience™. This is a convenient site that combines Teradata Vantage™ and Jupyter Notebook to provide data scientists with an immersive and interactive experience to analyze data, perform advanced analytics, and uncover actionable insights. Join me as we learn to analyze customer behavior using the Sessionize and nPath® functions of ClearScape Analytics to develop a targeted strategy for customer retention.
Table of Contents
1. The Fictional Challenge
2. Our Objectives
3. ClearScape Analytics Experience
4. Customer Behavior Analysis Demo
5. Applying Sessionize Function to Prepare Data
6. Applying nPath Functions to Analyze Data
7. Analysis and Visualization: Unveiling Common Patterns
8. Learn More Advanced Analytic Techniques
9. Additional Support
10. Resources
Watch the video tutorial
Prefer videos? Check out our YouTube tutorial instead!
1. The Fictional Challenge
Let’s suppose we manage a popular online retail membership store called “Home Sweet Home”. We specialize in selling stylish accessories for all home décor needs. Our store offers exclusive discounts, a rewards program, personalized recommendations, and free shipping with no minimum for our loyal members. However, Home Sweet Home is facing a big problem — many of its members are canceling their memberships, which is causing a decline in revenue and customer retention. Fortunately, our online store is tracking customer events such as web purchases, service inquiries, webchats, product browsing, calls, and more. We can use this data to perform behavioral analysis at a large scale and build a strategy to identify why our customers may be canceling their memberships.
The sample data we will be working with consists of 35,866 records regarding “Retail Events” retrieved from the corresponding view in the retailer’s database. Each of these records is composed of the following fields, entity ID, date stamp, and the action a visitor took. Manually examining this data to identify patterns would be a near-impossible task without the help of Sessionize and nPath functions:
Let’s dive into the step-by-step process of conducting behavioral analysis via ClearScape Analytics Experience.
2. Our Objectives
Our goal is to leverage ClearScape Analytics to identify the paths that “Home Sweet Home” customers take before canceling their memberships. This information will allow us to come up with a strategy to identify why these cancellations are happening and create a solution.
3. ClearScape Analytics Experience
To get started, head over to ClearScape Analytics Experience and create a free account or sign in: https://clearscape.teradata.com/sign-in.
ClearScape Analytics Experience is a site with all the tools we need to experience Teradata Vantage Analytics functions. The site enables us to configure free lightweight cloud environments with the full functionality of ClearScape Analytics on Vantage. We can create and modify limited data load databases within these environments. The same capabilities are available on all of Teradata’s platforms, supporting large-scale data processing with hundreds of nodes and petabytes of data.
- Once you have signed in, select Create an Environment.
- When naming your environment for this demonstration, I suggest something like `demo.`
- Create a password for your demo environment and note it, as you will need it later in the tutorial.
- Select your region and click on Create.
Congratulations on setting up your environment in Teradata Vantage on ClearScape Analytics Experience! With your environment up and running, let’s dive into the step-by-step process of conducting behavioral analysis.
4. Customer Behavior Analysis Demo
Select Run Demos Using Jupyter. A new tab will open your Jupyter environment.
Navigate to UseCases/ Customer_Behavior_Analysis_PY_SQL.ipynb.
Execute the first three code cell blocks by selecting the cells and pressing the Shift+ Enter keys to import the different client libraries and establish a connection to Vantage. The teradataml library executes in Vantage: enabling us to benefit from its massively parallel architecture. This architecture eliminates the need for data movement, as the teradataml library allows the execution of operations directly within Vantage.
#import libraries
import getpass
import warnings
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from teradataml import *
display.max_rows = 5
warnings.filterwarnings('ignore')%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)
eng.execute('''SET query_band='DEMO=BehavioralAnalysis_PY_SQL.ipynb;' UPDATE FOR SESSION; ''')%run -i ../run_procedure.py "call get_data('DEMO_Retail_cloud');"Note that we are connected to the DEMO_Retail_cloud database provided on cloud storage. This demo defaults to using foreign tables to access the data without using any storage in your environment. If you are following along with the default code, you will notice a # of views created with no storage used for Demo_Retail database. If you would like, you can switch modes by changing the comment in the string. Downloading data may result in faster execution of some steps that perform the initial access to the source data, but it’s awesome to be able to minimize data transfer and work directly within Vantage.
Next, we create a pointer to our data source without having to move data by creating a teradataml dataframe. In the resulting dataframe we can see the sample data collected by the Home Sweet Home site.
#1. Create the teradataml dataframe from our source table - this creates a pointer to the location without moving data
tdf_retail_events = DataFrame(in_schema('DEMO_Retail', 'Retail_Events'))
tdf_retail_events.head()5. Applying Sessionize Function to Prepare Data
We prepare our data for further analysis via Sessionization, a time series Vantage function. Sessionization stitches together events within a defined time window to create a logical grouping, similar to a session. Sessionization is a vital component of data analytics that will happen downstream. This function orders the activities of each user into a logical sequence so that we can understand the path of events for each user. We then use this sequence of activities as inputs for other analytics functions like nPath.
Other examples where Sessionization can be leveraged include:
- Fraud Detection: Grouping user actions and interactions into sessions based on time, facilitates the detection of fraudulent patterns or behaviors via nPath.
- Network Security Threats: Grouping network connections into sessions, facilitates the identification of suspicious behavior or unusual patterns that may indicate a security breach or unauthorized access via nPath.
For this demo, we “sessionize” user activities such as product browsing, webchat, and purchases within a 24-hour time frame (86400 seconds) to track the user journey leading up to membership cancellations of Home Sweet Home.
The implementation is straightforward as we can simply call it within SQL using SELECT * FROM Sessionize () with the following input parameters: — -
- PARTITION: The column used to identify different sessions (groups of events belonging to different users).
- ORDER — The column(s) to use to order events within the sessions.
- TimeColumn– Specifies the column containing time for each event
- TimeOut — The time threshold in seconds to consider events as part of the same session.
qry = '''
SELECT * FROM Sessionize (
ON DEMO_Retail.Retail_Events
PARTITION BY entity_id
ORDER BY datestamp
USING
TimeColumn ('datestamp')
TimeOut (86400)
) ;
'''
sessionized_events = DataFrame.from_query(qry)
sessionized_eventsWe commit our sessionized results to a permanent table. Note that the `replace` keyword will overwrite any existing table with the same name.
sessionized_events.result.to_sql(table_name = 'demo_sessionized_events', if_exists = 'replace')In the result preview above, we can see an example from the demo_sessionized_events dataframe. The user with entity_id 1578 interacted with the Home Sweet Home website, specifically making a Return Policy Inquiry and a Neutral Call within a 24-hour period. As a result, the sessionize function assigned both events a SESSIONID of 0. The dataset consists of a total of 35,866 events tagged with a session ID.
Now that we have assigned session IDs to our dataset, we can utilize the nPath function to identify patterns that lead to membership cancellations. By analyzing the sequence of events within sessions, we can uncover patterns of interactions that are indicative of users canceling their membership. Once we have this information, we can begin to understand why many customers are churning.
6. Applying nPath Functions to Analyze Data
nPath is a behavioral analytic function that will scan through sessionized data to identify patterns of activities before certain outcomes occur. It is important to highlight that nPath is a technique that is not limited to people and behavioral analysis, but it can also be applied to inanimate objects including sensors to identify relationships between sensor readings and equipment performance.
In our case, our nPath function will scan all 35,866 sessionized rows in our database to identify the patterns that we define. Specifically, we want our function to output patterns or paths of at least two actions taken before membership cancellation and no more than five actions.
We will leverage our function to do all the heavy work of matching our complex pattern in the input data as well as defining the output values for each matched set of rows. Similar to calling our Sessionize function, we can execute nPath function via SQL query SELECT * FROM NPATH and input parameters.
Let’s explore the required parameters for our nPath function:
We specify the input data using:
- ON: Specifies the input data as `demo_sessionized_events` on which the NPath function will be applied.
- PARTITION: Identifies the column used to divide the data into separate groups or sessions (sessionid)
- ORDER: Declares the column that is used to order the events within each session as `datestamp`
- MODE: Defines the mode for nPath to evaluate our data as NONOVERLAPPING. This mode starts matching again at the row that follows the previous match, ensuring that matches are not overlapping.
- SYMBOLS: Creates column expression aliases that will be used to create a pattern for nPath to match against the input data. In this example, two symbols are created. The first, ‘True as A’: is setting A to match any row in the input data. The Second ‘EVENT in (\’Mem Cancel\’) as B’: This symbol B matches rows where the value in the EVENT column is equal to ‘Mem Cancel’.
- Pattern: Defines the pattern that nPath needs to search across the rows of events in the input data. This pattern is composed of the symbols we create in the previous parameter along with additional directives. In this example we have the pattern `’A{2,5}.B’ ` that declares a range of any row (A) between 2 and 5 times preceding ‘Mem Cancel’ (B) — A{2,5}.
Execute the code cell to view the paths identified by nPath.
qry = '''
SELECT * FROM NPATH
(ON demo_sessionized_events
PARTITION BY sessionid
ORDER BY datestamp
USING
MODE (NONOVERLAPPING)
SYMBOLS (True as A ,event IN ('Mem Cancel') AS B)
PATTERN ('A{2,5}.B')
RESULT (FIRST (entity_id OF A) AS entity_id,
FIRST (sessionid OF A) AS sessionid,
ACCUMULATE (cast(event as VARCHAR(50) CHARACTER SET UNICODE NOT CASESPECIFIC) OF ANY(A,B)) AS path,
COUNT (* OF ANY (A,B)) AS event_cnt)
);
'''
npath_mem_cancel = DataFrame.from_query(qry)
npath_mem_cancelWow! That was fast, right?
nPath runs inside Vantage using its full parallel processing power, that’s how it could return results for complex analytics in seconds. In the output, we can observe that the nPath function has created a path that each customer took before canceling their membership. This is helpful, but it can still be challenging to identify common patterns in our large dataset. To further analyze our data, we can obtain a grouped view of the paths and their corresponding occurrence counts, sorted in descending order based on the event count. This enables us to easily identify the most frequent paths leading to membership cancellations.
7. Analysis and Visualization: Unveiling Common Patterns
Execute the code to operate on the data as it lies in the database and retrieve the aggregated results.
npath_sessions.result.groupby(['path']).count().sort(['count_sessionid'], ascending = False).head()Here we can easily identify an occurrence of a complaint call happening before memberships are canceled. With this knowledge, we can suggest conducting sentiment and text analysis on the call’s transcripts to identify what the complaints are about. This is just one strategy we can execute to identify why some members are canceling their membership.
Additionally, the notebook demonstrates how we can operate on the data using the pandas dataframe to generate different visualizations. The histogram below visualizes the distribution of event counts in paths. For example, we can see that 1750 paths had a total of 6 events before memberships were canceled.
df_npath_pandas = npath_mem_cancel.to_pandas()
ax = df_npath_pandas['event_cnt'].value_counts().sort_values().plot(kind='bar', figsize=(7, 6), rot=0)
plt.xlabel("Event count in path")
plt.ylabel("Number of Path")
plt.title("Number of events in a Path", y = 1.02)We can also use a Sankey diagram to visualize the distribution of multiple paths that customers took before canceling their membership.
from tdnpathviz.visualizations import plot_first_main_paths%%time
plot_first_main_paths(npath_mem_cancel,path_column='path',id_column='entity_id')The Sankey diagram takes the nPath output as an input to display the main paths our customers took to cancel their memberships. To inspect details of the path or node we can move the mouse pointer over it and see the number/count of entities that followed that specific path from membership purchase to cancellation.
We have successfully achieved the two goals we set out to accomplish at the start of this demo. First, we have identified the paths that lead customers to cancel their memberships. Second, we have formulated a strategy to begin investigating the underlying causes of these cancellations through sentiment and text analysis. Notably, all of this was accomplished in a parallel, highly scalable environment without having to leave the Jupyter Notebook.
8. Learn More Advanced Analytic Techniques
Congratulations on completing this walk-through example!
ClearScape Analytics Experience is a convenient platform that introduces us to the vast capabilities of Teradata Vantage. In this step-by-step guide, we explored how to perform time series analysis with Sessionize and nPath functions, important tools that will help businesses discover actionable insights. We have only scratched the surface of the vast functionality offered by ClearScape Analytics. I invite you to explore the various demos on industry-leading in-database analytics that you can run on your own. Each notebook describes the business situation, attaches the required data from the cloud, and guides you through the step-by-step ClearScape Analytics functionality.
9. Additional Support
If you encounter any issues while running through the demo, we encourage you to join the community and post your questions: https://support.teradata.com/community. We also invite you to share your thoughts in the comments below and explore the helpful resources available on the Teradata Developer Portal.
10. Resources
Teradata Vantage
Teradata Python Package User Guide
Teradataml Python Reference
Teradata nPath Function Reference
Teradata Sessionize Function Reference
Sessionization: In this demo, we leveraged the Teradata Sessionization Function to assign a Session Identifier to a series of events within a specific time window. Learn more by watching this video:
nPath: nPath is a powerful in-database Teradata function that helps identify paths that lead to an outcome. After applying sessionization to our data, nPath works by going through rows of data and looking for the event patterns that we determine. For each group of rows that matches the pattern, nPath produces a single row, simplifying the analysis process and providing valuable insights.
Jupyter Notebooks – ClearScape Analytics Experience uses Jupyter Notebooks to run our Demo. Jupyter Notebooks are an open-source application that integrates documentation, code, visualizations, and narrative text within code execution cells. Jupyter also supports Python, R, Julia, and Teradata SQL enabling users to leverage a versatile environment for data analysis.
About the Author
Janeth Graziani is a Developer Advocate at Teradata who enjoys leveraging her expertise in technical writing to help developers discover new tools and technologies for their stack. Before joining Teradata, Janeth was a Developer Advocate at Autocode where she helped build a thriving developer community. Connect with Janeth on LinkedIn!

