Data Science in eCommerce — Part 2

Maciej Piwoni
DataShop
Published in
2 min readMay 5, 2017

Data Preparation

In previous article, we started with customer journey dataset that looks as this:

Multi channel funnel (MCF) report in Google Analytics

Google analytics allows to extract MCF data in csv format.

Original three columns MCF dataset

There is couple of problems with above data set:

  • Path is saved as a text (string). It is not the most useful format for the analysis,
  • We are missing couple of additional information like path length.

We will transform above data to more useful format. Using Python and Pandas we can expand our variables to make it easier for analysis:

Adding additional variables

New data set contain number of additional variables:

  • path length: showing number of touchpoints,
  • Count of touchpoints by each channel: example — 1x direct, 2x social,
  • flag if the particular channel was the last touchpoint (example: paid media as the last touchpoint before the conversion).
  • Average conversion value (calculated as sum of conversion values divided by the number of conversions)

Above transformation we increased number of columns from three to 24.

Python script automatically identified all unique marketing channels that were present in customer journeys and created appropriate columns:

'Direct', 'Affiliates', 'Email', 'Organic_Search', 'Paid_Search', 'Other', 'Social', 'Referral', 'Display'

Click here for the python script used for data load and transformation.

We can take a look at the summary statistics for our transformed data set:

Summary statistics

Learn how to interpret summary statistics in Part 3 — Summary Statistics

--

--

Maciej Piwoni
DataShop

Global Data Strategy Manager. Critical Thinker. Digital Evangelist. Data Geek.