All Of Our Data In One Place

Thomas Vo
Making DonorsChoose
3 min readMar 14, 2018

What do marketers, designers, and data scientists have in common? They all rely on accurate data to get their jobs done. Here at DonorsChoose.org, we take data very seriously. When we need to make layout decisions for a redesign, we ask the data for guidance. When our project inventory takes a slight dip and we’re trying to figure out what to do next, we turn to the data. In seeking out growth opportunities, we don’t like to play the guessing game; we let the data lead the way.

But that’s easier said than done. In order to effectively make use of our data, it needs to be accessible. Our data is distributed over different pieces of software and owned by different teams. We store e-mail addresses in ExactTarget, dollar amounts in Salesforce, and customer service inquiries in Zendesk. In the past, if we wanted to map out all users who received an email the night before, got stuck on a certain page in our checkout flow, and wrote into customer service for help with their donation, we’d have to piece it all together through these different systems.

Over the last couple of years, we’ve made huge strides in centralizing our data, so that everything sits in one place. This makes it easy for anyone on our team to get the data they need in a matter of seconds. Let’s walk through what our centralized data stack looks like today.

The center of it all: Amazon Redshift & Looker

First, we needed to find a home for all of our data. We decided on Amazon Redshift as our main data warehouse — all of our data is routed to and stored in this magical place in the cloud. We chose Redshift because of its scalability and fast performance. If we find ourselves needing more processing power from one week to the next, beefing up our data warehouse is as easy as hitting the resize button. This allows us to allocate more nodes that all run in parallel. Scaling down during low season is just as easy.

We’ve found a great home for our data, but how do we go about pulling the data that we need, when we need it? Enter Looker. Gone are the days of writing long custom SQLs for every inquiry. Looker is a business intelligence tool that connects the data to a user-friendly interface and automatically generates SQL to pull data based on user selections, while simultaneously creating beautiful visualizations, such as the one below.

Donations to classrooms, per capita — March 14, 2018

Together, Redshift and Looker form the backbone and the front-end of our analytics infrastructure.

Piping data into Redshift: Fivetran & Heap SQL

All of our data sits in Redshift, and all of our staff has access to this data via Looker. But we still need to pipe all the data from our different applications into Redshift. Fivetran handles most of that for us. In a nutshell, Fivetran manages the hourly streaming of Salesforce, Zendesk, and transactional data into our Redshift warehouse. Fivetran also handles CSV uploads, which gives us the flexibility to upload data from anywhere.

When we run into a data source that Fivetran has yet to create an integration for, we have to be a little more creative. We’ve written a custom pipeline for getting data out of ExactTarget, using Amazon S3 for temporary storage and Amazon EC2 for processing. Up until recently, we were doing something similar for our event data. We use Heap Analytics for our event data tracking, and as of the past year, their Heap SQL offering manages the streaming of our event data into our Redshift warehouse.

What comes next?

So that’s it! Powering our analytics are all of these siloed services, but when they come together, our staff are better able to serve our donors and teachers. So what’s next? What does one do when given a gold mine of data? We munge, we analyze, and we build. Stay tuned to hear about how we’re making use of this data to improve our product.

--

--