What kind of web traffic generates the most money?

Wrangling my data

As soon as I began, I hit my first major Google Analytics limitation: You are only allowed to export .csv files that are five features and five metrics wide, and five thousand rows long. This means that I was forced to export these maximum sized .csv’s quite a few times and then merge them in a Jupyter notebook. There is a way to get a somewhat larger .csv using the Google Analytics API, but given the size of my data, it wasn’t worth it to set that up.

Creating a model

Initially, my goal was to predict the type of product that was being purchased. However, I found this extremely difficult to do, given my smaller data size (~7000 rows) and a high variety of products. Even after binning down the product types, my model tended to just guess the majority class, barely beating the baseline in the best cases and underperforming the baseline in the worst cases. I was also limited by what features I could choose to eliminate leakage. For example, using the revenue to predict product type gave me 99%+ accuracy, so revenue was dropped.

Deploying my web app

Even though my model is imperfect, it still has some valuable insights.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store