# Data Science in eCommerce — Part 3

### Summary Statistics

Let’s take a closer look at the summary statistics of out transformed data set. It will reveal some interesting information:

In this article, we will focus on the two variables: conversions and path length.

- Conversions

It has a mean of 9.8 but a standard deviation of 125.8. This looks quite suspicious. Box plot confirms distribution of the values — majority of the observations has number of conversions equal to 1 or 2 (read more about quartiles and box plot).

Number of observations (1,851) and the maximum value in the set (4,062) provides some clues. Let’s visualise this data:

Look at the distribution of the values points into Pareto distribution. It gives us a hint about the use of the 80/20 rule know as Pareto principle. In above case we can translate into the following statement: *‘Majority of the conversions comes from the limited number of customer paths’.*Definitely worth a further exploration.

2. Path Length

It has a mean of 7.6 and standard deviation of 6.1. Maximum value of 161 may indicate outliers. Box plot and distribution plot will help to understand distribution.

Again, we have a case of Pareto distribution. More properties of this type of distribution will be shown in the next part.

Some business takeouts:

- Disparity in the number of conversions: some paths had 4,062 conversions while 75% of all paths had only up to two conversions. We may want to take a closer look at those outliers to understand where bulk of conversions takes place.
- 50% of observations has a path length equal or shorter to 6 touchpoints,
- 75% of observations has a path length equal or shorter to 9 touchpoints,
- There some some outliers skewing the summary statistics — observations with up to 161 touchpoints to conversion.