How to find Returning Customers vs New Customers using SQL

Pushpinder Singh Arora
Nov 7 · 3 min read

In most of the marketing literature, one finds that the benefits of retaining a customer are more than acquiring a new customer. A lot of it has to do with the customer acquisition cost, which is usually substantial as compared to the cost of maintaining a customer.

Acquiring new customers can include activities such as marketing (online via Facebook, Google’s network, native advertising, email marketing etc. or offline through traditional means). Retaining a customer can include activities such as re-marketing, email marketing (using triggers) etc. But it is evident that a loyal customer will have more propensity to buy from you and lesser effort is needed to get a higher pocket share from the existing customer compared to hunting for a new customer.

Lets say you have done the re marketing and are now waiting the fruits of your labour. How would you know what percentage of your customers in a given time period are repeating customers, coming from previous months, and what percentage of total customers are newly acquired customers? In offline cases with low volume it’s manually feasible. What about huge ecommerce platforms? Lets say your shop exists in Shopify. And your customer volume per month is huge (say >1000). You’ll need to use reporting tools and analytics to arrive at such an analysis. And its fairly simple once you know what do you need.

Scope of this article is to outline the procedure using SQL (Amazon Redshift commands & functions used).

Methodology

All you need to perform this analysis is a transactions table from your POS (or ecommerce platform). This table needs to have the following fields:

· Order ID [order_id]

· Customer ID [customer_id]

· Purchase Date [date]

There might be more details available such as customer billing address, product name etc. which we don’t need. Lets assume the table name to be “pos_transactions”

Find the first purchase date per customer

Extract distinct customer_id, date and calculate the first purchase date.

first_purchase_table = SELECT DISTINCT

customer_id, date,

MIN(date) OVER (PARTITION BY customer_id) AS first_purchase_date

FROM pos_transactions;

What we have just done is, for each customer, appended a column which will contain their date of first purchase, irrespective of their occurrence.

Join this table back to the pos_transactions. Now you have against each order_id (transaction), the date of that transaction, the customer_id of customer who transacted, date on which that transaction happened and also the date on which that customer did her first transaction.

2+2 = 4

Comparing the date and first_purchase_date, you can get a flag (column using a CASE WHEN/ conditional statement). This flag tells whether on that day, the customer in the row was a returning customer or a new customer.

CASE WHEN ‘date’ = ‘first_purchase_date’ THEN 1 ELSE 0 END AS ‘isNewCustomer’

Create another flag, opposite of the earlier one. Hence you’ll end up with 2 flags: 1st — isNewCustomer, 2nd — IsReturningCustomer.

If you can pull this data into a visualization tool, then viola you have the required dimensions — date, and the 2 flags. Plot date along the x axis and you can use a stacked area chart for the sums of two flags which will show that on a particular date, how many of the transactions were repeating transactions and how many were first time transactions. The graph on the side is something you can derive.

Comments are welcome!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade