# Data Science in eCommerce — Part 3

### Summary Statistics

Let’s take a closer look at the summary statistics of out transformed data set. It will reveal some interesting information:

In this article, we will focus on the two variables: conversions and path length.

1. Conversions
It has a mean of 9.8 but a standard deviation of 125.8. This looks quite suspicious. Box plot confirms distribution of the values — majority of the observations has number of conversions equal to 1 or 2 (read more about quartiles and box plot).

Number of observations (1,851) and the maximum value in the set (4,062) provides some clues. Let’s visualise this data:

Look at the distribution of the values points into Pareto distribution. It gives us a hint about the use of the 80/20 rule know as Pareto principle. In above case we can translate into the following statement: ‘Majority of the conversions comes from the limited number of customer paths’.
Definitely worth a further exploration.

2. Path Length
It has a mean of 7.6 and standard deviation of 6.1. Maximum value of 161 may indicate outliers. Box plot and distribution plot will help to understand distribution.

Again, we have a case of Pareto distribution. More properties of this type of distribution will be shown in the next part.