A Pivot(Table) to Code

In my past life as a digital marketer, I worked with spreadsheets a lot. And I absolutely loved it. Along with figuring out technical challenges, analyzing data was one of my favorite responsibilities.

Recently I completed an analysis challenge for a job at a digital advertising startup. I received a CSV of fake ad campaign data and a handful of analysis questions. Having done analysis like this before for ad campaigns, I knew I could use Excel to roll-up and manipulate the data. I decided I would also try to accomplish the same thing with code — and so I learned to create a Ruby script that converts the original dataset into a new CSV with aggregated values and new calculated fields.

Here’s what I learned…

About PivotTables

Sometimes when I used to tell people that I work in online fundraising for nonprofits, I’d get a confused look — almost as if to say “Why do you do that?” I explained exactly why I loved it: in addition to powering causes I believe in, digital fundraising is like an ongoing experiment in human behavior. Every campaign produces more data based on the humans that interact with it (ex: email performance, landing page conversion, site visitor analytics, donation stats), every analysis provides more takeaways (ex: did any version win the A/B test?) and — hopefully, if you’re doing your job right — every subsequent campaign strategy produces better results because you conducted an experiment and learned from it.

One of my preferred ways to analyze large datasets was with Excel PivotTables. For example, I frequently downloaded a CSV report of online donations, one gift per row. Each gift had a value for sourcecode— indicating which marketing effort generated the donation — along with information about the gift amount, date, etc. PivotTables helped me easily roll-up or aggregate these transactions into different buckets. For example, here’s looking at totals by source code (number of gifts and amount donated):

Here’s a simple example of using PivotTables to bucket / aggregate data. Rows 1–9 are donations. Rows 11–16 are the PivotTable created from that donation data, bucketed by sourcecode.

Great. Now that you see how I used PivotTables, let’s pivot to talking about how I did the same thing with code.

See what I did there?

In this particular challenge, the CSV had columns for day, impressions, clicks, conversions, category, domain, and view_rate. I decided to roll-up all this data into categories, though I could have also chosen domain or day.

CSV’s and Ruby

I broke down this coding challenge into three parts:

  1. Import the data from the CSV
  2. Parse the data to aggregate (“bucket”) it by category and create calculated values
  3. Write that new data into a new CSV

Importing the Data

Ruby has a built in library called, you guessed it, csv. I used this helpful technicalpickles article and the Ruby docs to learn how that library works.

While you can load all the data at once before parsing it, I chose to go the “one row at a time” route because my dataset was quite large (13mb). CSV.foreach (line 9 in the gist below) takes two arguments — the location of the CSV file and a hash of options. In my case, I have options indicating there is a header row, that the header row should be converted into a symbol, and that values should be converted (numbers to integers and dates to dates, for example).

Parsing the Data

For each row of the CSV, I send its data to the bucket_category function along with the all_categories hash. When the CSV file is completely processed, the all_categories hash will consist of category names as keys and that category’s total values as a nested hash (e.g. :Travel => { :clicks => 5000 })

In bucket_category, I add the data from each row to the appropriate category in the all_categories hash:

Once that’s done, we have a hash representing the data that was in the original CSV — grouped by category! Because I’d like to eventually add a “totals” row to the bottom of the CSV we’ll create (total clicks, total impressions, etc), I need to go through this new hash and add up the values from every category:

Next up — calculated fields! In addition to the obvious calculations (ex: conversion rate = conversions / impressions ), I also want to use the total_values from the previous step to show each category’s relative contribution to overall campaign conversions.

One last thing before we create our new CSV — let’s sort our categories in descending order based on relative_performance:

Creating a New CSV

Now comes the easy part — using the freshly parsed and organized data to write a tidy new CSV. Using Ruby’s CSV library again, we can simply shovel every row into the CSV that we create:

And voila! When you run the Ruby script, our performance_data.csv is transformed into a CSV that looks — with a little formatting — like this:

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.