# Demystifying Vanity KPIs

## Online store visitors vs sales

One of the most common KPIs used to measure online performance is a number of sessions. Usually quite useful to gauge high-level view of incoming traffic. A quite typical scenario is for the business to stop at measuring sessions and use it as an indicator of ‘success’. This is a place when common assumption goes as follow “more people in — more sales”.

Let’s verify this assumption using statistics. We will test it against quantifiable business KPI — number of transactions in eCommerce.

If “more people in — more sales” assumption is correct we would expect the correlation between sessions and sales. Something like this:

Let’s take a look at the real data from the eCommerce store:
1. horizontal axis represents sessions,
2. vertical axis represents transaction,
3. each dot represents transaction. Its colour represents transaction value.

There is no simple intuitive relationship between a number of sessions and number of transactions. A couple of takeouts, though:

1. Increasing number of sessions beyond certain point does not increase the number of transactions (right hand side)
2. Low number of sessions up to certain level which seems to produce up to 10 transactions per session but it is very inconsistent,
3. There is another group of sessions /transactions (middle of the chart) seems to produce higher number of session but again it is very inconsistent. Perhaps there is a number of other factors that have an impact (acquisition channel, day of the week, user device?)

Is there a more scientific way to determine if there is or there isn’t linear relationship between sessions and transactions?

Gather more data
Is there more information available? In our case, we quickly discover that we have not only sessions and transactions but also transaction values and number of users visiting the online store. This looks like a promising start!

Correlation analysis
Start small with correlation analysis. This is a method of statistical evaluation used to study the strength of a relationship between two, numerically measured, continuous variables. It is important to remember that correlation analysis does not determine cause and effect.

How do we measure correlation? This diagram will be helpful:

Let’s take a look at our data using correlation methods in Python:

Note strong positive association between the number of sessions and the number of users (+0.99) and medium positive between number of sessions and transactions (+0.48).

How should we interpret the result?
This is the first step towards getting an understanding of what drives performance of the online store. The positive relationship between sessions and transactions is worth further investigation! Another interesting point is the relationship between the number of transactions and the revenue

What questions should we ask based on above findings?
Imagine a scenario when price reduction campaign triggers more visitors to the online store, reduced prices results in more transactions. Sessions and transaction will have positive correlation but causation agent is a campaign.

Keep in mind that correlation does not imply causation.

One step further — Pair plot
Another way useful way of looking into the data is a pair plot. In our case, it shows every possible pair of the variables we have. Similar to correlation plot but with much more details. We also get an additional piece of information for each variable: histogram. It shows the distribution of the values.

Compare the strength of the relationship with visual representation. In this case number sessions, users and transactions.

Sessions and users show almost perfect linear relationship. No surprise — it’s correlation value is +0.99. Transactions and sessions are more challenging to interpret. Could it be done? Absolutely!

Bonus Question: Does more transactions mean more revenue?

Intuitively many of us would say yes. In our case, the short answer is — it depends on the particular case. Pair plot below shows the relationship between transactions and revenue.

We can create the following hypothesis: 1 to 15 transactions per day will result in a linear relationship ( the increase of 1 transaction will lead to increase of x units in revenue). Beyond the value of 15 transactions per day, this relationship becomes much more volatile.

Why? Again, we can verify a couple of assumptions:
- A Higher number of transactions may be driven by diverse marketing tactics bringing a mixed type of clients. This will result in more diverse transactions,
-Coupons and promotions are used to drive extra sales. A number of transactions increases but transaction values is lower than usual.