Orchestrate a pipeline using AWS Step Function to perform Consumer Engagement Funnel Analysis & User Acquisition Optimization
In today’s competitive business landscape, attracting and retaining customers is essential for success. But how can you optimize your user acquisition strategies and improve customer retention? That’s where our recent project comes in — consumer engagement funnel analysis and user acquisition optimization. By analyzing each stage of the customer journey, we were able to identify key areas for improvement and help our client increase their customer base and revenue. Read on to discover the impactful results of our project and how your business can benefit from a similar analysis. In this post, we will explore what customer engagement funnel is, why it is important, how to measure it, data engineering workflow to set up a serverless pipeline to ingest, process, transform, storage and creating a dashboard to get insights.
Project Architecture:
Design Approach:
Step 1: Set up a Glue job to load data from Amazon RDS Aurora MySQL 5.6 and store it in an S3 raw bucket.
Follow Python script to create the Glue job:
import boto3
client = boto3.client('glue')
response = client.create_job(
Name='myspectrum_raw_to_s3',
Description='Job to load data from RDS Aurora and store in S3 raw bucket',
Role='aws-glue-service-role',
Command={
'Name': 'glueetl',
'ScriptLocation': 's3://myspectrum-scripts/glue/raw_to_s3.py'
},
DefaultArguments={
'--job-language': 'python',
'--job-bookmark-option': 'job-bookmark-enable',
'--enable-continuous-cloudwatch-log': 'true'
},
Connections={
'Connections': ['rds-connection']
},
MaxRetries=0,
Timeout=60,
WorkerType='Standard',
NumberOfWorkers=2
)
print(response)
Step 2: Trigger a Lambda function when data is uploaded to the S3 raw bucket.
Use an S3 event trigger to invoke a Lambda function that will start the Glue ETL job. Here is an example of how to create the Lambda function:
import boto3
def lambda_handler(event, context):
s3 = boto3.client('s3')
glue = boto3.client('glue')
# get the bucket name and key of the uploaded object
bucket_name = event['Records'][0]['s3']['bucket']['name']
object_key = event['Records'][0]['s3']['object']['key']
# create the glue job parameters
parameters = {
'source': 's3://' + bucket_name + '/' + object_key,
'destination': 's3://myspectrum-processed-bucket/',
'database': 'myspectrum_db',
'table': 'myspectrum_table'
}
# start the glue job
response = glue.start_job_run(JobName='myspectrum_raw_to_s3', Arguments=parameters)
print(response)
Step 3: Lambda function to trigger Glue script for processing data
Once the data is loaded to the S3 raw bucket, Trigger a Lambda function to perform data processing and transformation. We will write a Python script that uses the AWS boto3 library to invoke a Glue job that performs the ETL process. Here is some sample code:
import boto3
def lambda_handler(event, context):
# Extract S3 object key from S3 event notification
object_key = event['Records'][0]['s3']['object']['key']
# Create Glue client
glue_client = boto3.client('glue')
# Invoke Glue job to process the data
job_name = 'my-glue-job'
arguments = {
'--s3_input_bucket': 'my-raw-bucket',
'--s3_input_key': object_key,
'--s3_output_bucket': 'my-processed-bucket'
}
response = glue_client.start_job_run(JobName=job_name, Arguments=arguments)
# Log job run ID for troubleshooting
job_run_id = response['JobRunId']
print(f'Glue job {job_name} started with job run ID {job_run_id}')
Step 4: Glue job to process and transform data
The Glue job is responsible for performing the ETL process on the data. It reads the data from the S3 raw bucket, performs any necessary transformations, and writes the processed data to the S3 processed bucket. Here is some sample code for a Glue job:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from awsglue.context import GlueContext
from pyspark.context import SparkContext
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql import SparkSession
# Initialize GlueContext and SparkContext
glueContext = GlueContext(SparkContext.getOrCreate())
spark = glueContext.spark_session
# Extract arguments from Glue job configuration
args = getResolvedOptions(sys.argv, ['s3_input_bucket', 's3_input_key', 's3_output_bucket'])
# Read data from S3 raw bucket
input_bucket = args['s3_input_bucket']
input_key = args['s3_input_key']
input_path = f's3://{input_bucket}/{input_key}'
df = spark.read.json(input_path)
# Perform data cleaning and transformation
clean_df = df.select(col('id'), col('name'), col('age'), col('gender'))
# Write processed data to S3 processed bucket
output_bucket = args['s3_output_bucket']
output_key = f'processed/{input_key}'
output_path = f's3://{output_bucket}/{output_key}'
clean_df.write.mode('overwrite').json(output_path)
# Create Glue catalog database and table
database_name = 'my-glue-database'
table_name = 'my-glue-table'
glueContext.create_dynamic_frame.from_options(
connection_type='s3',
connection_options={
'paths': [output_path]
},
format='json'
).toDF().write.saveAsTable(f'{database_name}.{table_name}')
Step 5: Load data from Glue catalog DB to AWS Redshift
To load data from the Glue catalog database to AWS Redshift, use the following syntax:
CREATE EXTERNAL SCHEMA myspectrum_schema
FROM DATA CATALOG
DATABASE 'myspectrum_db'
IAM_ROLE 'arn:aws:iam::123456789012:role/myspectrum_role'
CREATE EXTERNAL DATABASE IF NOT EXISTS;
Step 6: Create an AWS QuickSight data source and dashboard
Once the data is loaded into AWS Redshift, create an AWS QuickSight data source and dashboard to visualize the data. Here are the steps to create an AWS QuickSight data source:
- Open the AWS QuickSight console.
- Choose “Manage data” and then “New data set”.
- Choose “AWS Service” as the source and then select “Amazon Redshift”.
- Choose the tables you want to include in the data source.
- Choose “Visualize” to create a dashboard based on the data source.
Step 7: AWS Step Function
To automate the entire data engineering pipeline, we can bundle up all the above steps in an AWS Step Function. AWS Step Functions allows us to define a state machine that coordinates the execution of the various steps in our pipeline.
Here is a sample code for the AWS Step Function in Amazon State Language (ASL):
{
"Comment": "My Data Pipeline",
"StartAt": "LoadFromRDS",
"States": {
"LoadFromRDS": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "load-from-rds-to-s3-raw-bucket",
"Arguments": {
"--S3_RAW_BUCKET": "my-s3-bucket-raw",
"--RDS_HOST": "my-rds-host",
"--RDS_DB": "my-rds-db",
"--RDS_USER": "my-rds-user",
"--RDS_PASSWORD": "my-rds-password"
}
},
"Next": "ProcessData"
},
"ProcessData": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-west-2:123456789012:function:process-s3-raw-bucket",
"Retry": [
{
"ErrorEquals": ["Lambda.Unknown"],
"IntervalSeconds": 30,
"MaxAttempts": 2,
"BackoffRate": 2.0
}
],
"Catch": [
{
"ErrorEquals": ["Lambda.Unknown"],
"ResultPath": "$.error",
"Next": "FailState"
}
],
"Next": "LoadToRedshift"
},
"LoadToRedshift": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "load-from-glue-to-redshift",
"Arguments": {
"--GLUE_DB": "my-glue-db",
"--GLUE_TABLE": "my-glue-table",
"--REDSHIFT_CLUSTER": "my-redshift-cluster",
"--REDSHIFT_DB": "my-redshift-db",
"--REDSHIFT_USER": "my-redshift-user",
"--REDSHIFT_PASSWORD": "my-redshift-password"
}
},
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.error",
"Next": "FailState"
}
],
"Next": "Finish"
},
"FailState": {
"Type": "Fail"
},
"Finish": {
"Type": "Pass",
"End": true
}
}
}
Read more about consumer engagement funnel below..
What is the consumer engagement funnel?
The consumer engagement funnel is a model that represents the journey a user takes from initial interaction with a product or service to the point of making a purchase. It’s a visual representation of the various stages a user goes through before making a purchase, from the initial awareness of a product or service to the final purchase decision.
Let’s take an e-commerce company as an example to understand the consumer engagement funnel. The funnel stages include:
Awareness: A user first becomes aware of the e-commerce company through various marketing efforts such as social media, search engine ads, or email campaigns.
Signup: The user signs up to the company’s platform to explore products, add products to the cart and complete the order.
First Order: After signing up, the user makes their first purchase from the e-commerce platform.
Engaged: The user continues to make purchases on the platform, indicating higher engagement with the brand.
Churned: A user who has not made a purchase in a certain amount of time is considered churned. For example, if a user has not made a purchase in the past 30 days, they can be considered churned.
Resurrected: If a churned user makes another purchase after a certain period, they are considered resurrected. For example, if a user who has not made a purchase in the past 60 days makes a purchase, they can be considered resurrected.
Now let’s dive deeper into these stages using an example of an e-commerce company:
Awareness: In this stage, the e-commerce company runs a social media campaign promoting their products to reach out to new users. They target 10,000 potential customers, out of which 2,000 visit the company’s website.
Signup: Out of the 2,000 users, 1,000 sign up on the platform and add products to their cart.
First Order: Out of the 1,000 users who signed up, only 500 make their first purchase, indicating a 50% conversion rate.
Engaged: Among the 500 users who made their first purchase, only 100 continue to make purchases regularly. They are considered highly engaged users.
Churned: Out of the remaining 400 users who made their first purchase but didn’t continue to make purchases regularly, 200 haven’t made a purchase in the past 30 days and are considered churned.
Resurrected: If a churned user makes another purchase after 60 days, they are considered resurrected. Out of the 200 churned users, 50 made another purchase after 60 days, indicating a 25% resurrection rate.
By analyzing these numbers and identifying areas for improvement, the e-commerce company can optimize its user acquisition and retention strategies. For example, they can focus on improving their marketing efforts to increase the number of sign-ups or target personalized messaging to resurrected users to encourage them to make more purchases. By optimizing the consumer engagement funnel, the e-commerce company can improve its overall revenue growth and customer satisfaction.
Why is the Consumer Engagement Funnel Important?
Customer engagement is critical to the success of any business, as it has a direct impact on customer satisfaction, loyalty, and advocacy. The statistics speak for themselves:
84% of customers say that the experiences a company provides are equally as important as products and services, which is an increase from 80% the year before. Additionally, 9 out of 10 Americans are willing to pay more for a better Customer Experience.
Engaged customers not only spend more, but they are also less likely to churn and more likely to recommend your brand to their friends. This means that customer engagement can have a significant positive impact on your bottom line. Brand advocates spend more with your brand and also recommend your brand to others, increasing customer lifetime value and reducing churn.
Higher levels of customer engagement also result in more brand equity, which is the degree to which customers believe your brand can meet or exceed their expectations in your particular consumer category. When customers are engaged, they trust your brand more, and they are more likely to choose you over a competitor. This increases your brand equity and can lead to long-term success
Optimizing User Acquisition
User acquisition is the process of acquiring new customers through various marketing channels. This can include social media advertising, search engine optimization, email marketing, and more.
To optimize user acquisition, businesses need to have a clear understanding of their target audience and where they can be reached most effectively. For example, if a business is targeting a younger demographic, it may find success with Instagram advertising, whereas if they are targeting professionals, LinkedIn advertising may be more effective.
In addition to identifying the most effective channels for user acquisition, businesses should also focus on providing a seamless and consistent experience across all touch points. This includes ensuring that messaging is consistent, branding is cohesive, and that the user experience is optimized for each platform.
What is Cohort based user segmentation?
Cohort-based user segmentation is a technique used to group customers based on their behavior and demographics during a specific period of time. In this case, the segmentation is based on the user’s first order on the app and their subsequent behavior over time.
By grouping users into cohorts based on their behavior after their first order, businesses can better understand their customers and tailor their retention efforts to meet their specific needs. For example, businesses can analyze the behavior of users who placed their first order in a particular month and identify trends or patterns in their subsequent activity.
Cohort-based user segmentation is an effective way to track customer behavior and optimize retention efforts. By tracking customer behavior over time, businesses can identify key drivers of engagement and create targeted retention campaigns that speak directly to the needs of their customers. This approach can help businesses improve customer loyalty, reduce churn, and increase customer lifetime value.
The segmentation categories include:
High Engaged: These are customers who have placed more than three orders per month. They are the most engaged and loyal customers, and they can provide valuable insights into what drives customer loyalty.
Low Engaged: These are customers who have placed one, two, or three orders per month. They are less engaged than high-engaged customers but are still actively using the app. Businesses should focus on encouraging these customers to become more engaged by providing targeted promotions or incentives.
Inactive (recently active): These are customers who have not placed an order in 28 days to 3 months. They are still considered active because they have recently used the app, but they may need a nudge to come back and make another purchase.
Soft churned: These are customers who have not placed an order in the past 3, 4, or 5 months. They are at risk of churning, but there is still a chance to bring them back. Businesses should focus on re-engaging these customers with targeted promotions or personalized outreach.
Hard churned: These are customers who have not placed an order in over 6 months. They are considered lost customers, and it is unlikely that they will return to make another purchase. However, it is still important to understand why they churned and use that information to improve retention efforts for other customers.
Resurrected: These are customers who were hard-churned before but have returned to make another purchase. They are valuable customers because they have demonstrated that they are willing to give the business another chance. Businesses should focus on providing a personalized experience to these customers to ensure that they remain engaged and loyal.
Cohort-based user segmentation refers to dividing customers into groups based on a shared characteristic, such as the month in which they made their first purchase. Here are some examples of how this can be done:
The month of First Purchase Cohorts: In this type of segmentation, customers are grouped based on the month in which they made their first purchase. For example, customers who made their first purchase in January would be in one cohort, customers who made their first purchase in February would be in another cohort, and so on. This segmentation allows businesses to understand the behavior of customers who joined the platform at different times and tailor their marketing and retention strategies accordingly.
Order Frequency Cohorts: In this type of segmentation, customers are grouped based on their order frequency after their first purchase. For example, customers who placed more than three orders per month after their first purchase would be in the high-engaged cohort, customers who placed one, two, or three orders per month would be in the low-engaged cohort, and so on. This segmentation allows businesses to identify the most engaged customers and target them with personalized offers or promotions.
Time Since Last Purchase Cohorts: In this type of segmentation, customers are grouped based on the time since their last purchase. For example, customers who made a purchase within the last 28 days would be in the recently active cohort, customers who have not made a purchase in 3–5 months would be in the soft-churned cohort, and customers who have not made a purchase in over 6 months would be in the hard-churned cohort. This segmentation allows businesses to target customers who may need a nudge to make another purchase and re-engage customers who are at risk of churning.