What I Wish I’d Done Differently as a Data Science Leader: On Centralizing Siloed Data

In 2014, I joined Pebble (the once-glorious smartwatch maker later acquired by Fitbit) to lead their data science & analytics team. Part of the reason I joined was that I was interested in the challenges of managing a data organization at a hardware company. My last full-time gig had been at a video game developer on the Facebook platform, where analytics were the lifeblood of the company and we had centralized logging on everything from in-game mechanics to user acquisition. I knew that working at a hardware company would be different. We would have many of the same business concerns that I had encountered in my previous work — where do customers find our product, how do they use it, how can we make it better — but much more limited data to leverage in order to answer these questions. When a customer buys your product off the shelves in Best Buy, it’s hard to infer where they heard about you and to measure the impact of your marketing efforts. When a customer passively wears your product on their wrist, it’s hard to know which customers use the product every day and which ones mostly leave their watch lying in a drawer.

Modeling my favorite watch face.

Over the course of my two years at Pebble, we came up with a number of inventive ways to understand how people used the Pebble and to determine how we could improve it. But we struggled with data distributed across disparate systems: the database of logging from the watch, marketing info from Google analytics, giant spreadsheets of sales by channel from our retail distributors, customer support queries and returns info locked away in an un-queryable third party tool. Many companies face the same issue: the centralized analytics team has access to a mere sliver of information across the diverse business functions of the company, but they can’t get the cross-functional buy-in to pull the siloed data into a central location.

It’s easy to pay lip service to the virtues of having all the data in one place — we sang the praises of centralization at Pebble! — but actually achieving that goal is a non-trivial task, and requires multiple people in multiple job functions to sit down and agree to make it happen. If the siloed data doesn’t have an immediate and well-defined value-add, everyone will struggle to prioritize centralization over all the other tasks they have on their plates. Even if you’re using a tool that should make data centralization “easy,” you’re going to have to find the right person to, say, get you the proper authentication token to connect the siloed data source to your central data store, and this “simple” task may drag on for days until you de-prioritize and effectively abandon it. There may also be legitimate reasons to keep the data siloed that different parts of the organization don’t fully understand: some companies have sensitive customer data that should not be accessible to the whole organization for privacy reasons, for example.

The fence between the data you have and the data you need.

“We don’t have the data” is a standard part of the data-worker’s lament. (I say “data-worker” to encompass data scientists and data analysts, since I managed both, and since in this instance they share a lot of the same concerns.) As a manager, I felt that part of my job was to let my team vent to me about the challenges of the minutiae in their day-to-day work. I know how frustrating data work can be — you spend hours cleaning up a bunch of messy data just so you can run a simple script over it, you discover holes and bugs in the logging that mess up your entire methodology, or you find yourself unable to answer the questions that C-level executives are asking because the data you need simply isn’t there. Working with data involves a lot of banging one’s head against the proverbial wall, and my job as a manager was to acknowledge that frustration and help the team work through it as effectively as possible. Sometimes, a manager can offer constructive solutions that actually solve these problems (implement an ETL script to keep the data clean, talk to engineering about fixing the bugs, work with another team to get access to the data you need), but often the best option is to encourage the team to find a workaround solution. It’s easy to get wrapped up in the details of a problem and start making the perfect the enemy of the good. As often as possible, I wanted to bring my team members back up to the bird’s-eye view so they could find a good-enough solution to resolve their problem efficiently.

When folks on my team made off-hand comments that we should have centralized access to Google analytics, or Zendesk, or retails sales data, I often let it slide. The data usually wasn’t critical to solving their immediate problem, and as a manager I tended to de-prioritize anything that was a “nice-to-have.” I also had an intuition that our team probably didn’t understand all the complex, organizational reasons why that data wound up siloed in the first place, so I didn’t want to spend valuable time and energy chasing after data that would wind up as a dead end. But I wish that instead, whenever my team complained that they didn’t have access to something, I had told them to go solve their own problem. Go talk to the e-commerce team about Google analytics. Go talk to the support team about Zendesk. Go talk to the sales team about sales channel data. If you want the data, make it happen! No one is going to care about this data more than you, so you make it a priority. I will give you all the tools you need to get it into our central database. If you don’t have access to the data, that is your problem to solve. Go get it!

Collect ALL the data!

In truth, I did make some attempts to empower my team to chase after the data they needed. But in these instances, they would often come back with a new understanding of why what they thought would be easy was actually hard, and they would drop their concern and move on to their next task. I wish that I had encouraged them further, and told them to keep after it. While the immediate value of centralized data isn’t always apparent, I do believe there are residual and compounding effects of putting all the data you can into one place. But often, managers will not be able to prioritize the task of centralization. There are too many siloes, too many teams to get on board, and too many hair-on-fire concerns that will always take priority. The top-down approach to centralizing data — the data science VP working with the marketing VP and the sales VP and the operations VP and the CEO — is too gargantuan. But doing it piecemeal — having one motivated data-worker find a collaborator in another department and moving forward on centralizing whatever little bit of data they can — might be successful.

Pebble had the right culture for that kind of bottom-up approach. We were by and large a friendly, thoughtful, generous, and independent-minded group of people. Our organizational culture emphasized a sense of autonomy, and no one was going to be offended by a junior team member taking the initiative to solve a problem they cared about. The bottom-up approach to data centralization might not work in every company, but in general I think data science managers can benefit from passing responsibility for data acquisition off to whatever members of their team are invested in said data. They’re the ones that are going to be using it — why shouldn’t they have the responsibility of advocating to get the data they need?