Data, Data Everywhere

Published in

Making DonorsChoose

5 min readMar 1, 2018

Every morning, when I arrive at the office, the very first thing I see is a flat screen television. Every few minutes it flashes as the contents on the screen refresh; the line graph in the middle of the screen tracking donations per hour rises; the pie chart showing the breakdown of the types of projects shifts; the number of site visitors ticks up.

After glancing at the screen, I make my way to the kitchen, often passing colleagues meeting in the open space nearby, huddled around their computer screens, looking at different reports and data visualizations.

Coffee cup in hand, I head to my desk and fire up my computer. The first email I pull up is our daily dashboard. It lands in my inbox every single morning. It recaps yesterday’s stats, compares year-over-year growth, and tracks our progress toward our annual goals.

As a data scientist on the team, this routine might not surprise you. But here’s the thing: my morning routine isn’t unique. It’s the same for Ali on our partnerships team, or Zobaida on customer experience, or anyone else for that matter. At DonorsChoose.org, data is everywhere. It literally surrounds us in the office. It touches every member of our team and their work, every day, 365 days a year. But it wasn’t always this way.

Water, water everywhere / Nor any drop to drink

DonorsChoose.org started helping teachers get what they need for their classrooms in the year 2000, and from the get go collecting data on what teachers needed was central to our mission. By the end of 2010, we’d already logged over 275,000 submitted projects and nearly a million donations. In the years that followed, that number would grow exponentially:

The goal in collecting all of this data was simple: we wanted to have a high resolution understanding of what teachers needed. We wanted to be able to chart and predict trends in classroom requests. We wanted to be able to quickly diagnosis and address pain points with our product. We wanted to learn more about our donors and their giving preferences.

But much like the Ancient Mariner, although surrounded by this ocean of data, it wasn’t accessible or interpretable, and therefore, it was useless. So we set out to change that. In 2012 we hired our first data scientist, and focused on investing in our infrastructure, tools, and people.

Infrastructure & Tools: Build (and Rebuild)

The result of our initial investments in our data infrastructure was a stable Postgres database that we could query. But, like the roadways of a growing city, it soon wasn’t performant enough to meet our needs.

So we migrated to Amazon Redshift to increase our capacity. But we quickly faced another challenge: our move to Redshift allowed us to query to our hearts content, and demand for data from the team was higher than ever. It was no longer enough to have yesterday’s data available. We needed the latest data generated accessible to our product, marketing, and operations teams. Eventually, we landed on using a third-party service to continuously sync data from our production database to Redshift.

Now we had the right infrastructure in place, but information was still centralized with the data team. We shifted our focus to build out a business intelligence tool called Looker on top of our database. Using this tool allowed us to easily share data with teams across the organization; it made data interpretable to those who couldn’t write raw SQL.

So we had up-to-the-hour transactional data available to everyone on staff housed in a user-friendly tool that empowers anyone on the team to run their own analysis. Next, we had to get everyone comfortable using it.

People: Data, Democratized

To spread data-literacy, we needed to spend as much time with our people as we did with our infrastructure. We ran in-person trainings throughout the year, held data office hours, and worked with colleagues one-on-one.

These efforts eventually culminated in a full-fledged data bootcamp program, where a cohort of data-enthusiasts from teams across the organization learned to code, to think like a data analyst, and to apply their learnings to different areas of the business.

Fast forward to today, and all this work has paid off. We have self-service data onboarding tools, a streamlined system for teams across the org to seek data support, and “data masters” embedded in every team in the organization. This means our colleagues are building their own dashboards and running their own analyses. It also means the data science team can focus on exciting new applications of our data.

The Next Frontier

With a strong foundation in place, we see some more strategic opportunities on the horizon. These include improving the way we collaborate with decision-makers, working on more personalization in our product, doing more sophisticated forecasting and anomaly detection, and leveraging event data.

If our transactional data is the ocean, then our event data is the ground water, the rivers, the water molecules in the atmosphere — there’s so much of it and we’re still learning how to collect it, (in the past 6 months alone, we’ve logged over 19 million events). And here again we find ourselves with untapped potential. The waters ahead likely look like the waters we’ve already sailed: rethinking our infrastructure, spreading a new gospel of the power of event data, and teaching people how to best leverage this new information.

But if we’ve learned anything from this journey, it’s that this is an investment worth making.

Data, data everywhere
And not a row to waste

Data, Data Everywhere

Water, water everywhere / Nor any drop to drink

Infrastructure & Tools: Build (and Rebuild)

People: Data, Democratized

The Next Frontier

Written by Barbara Cvenic