I've spent much of the last decade building various products or tools from healthcare billing to finance and now advertising all focused on extracting the valuable information contained in some dataset(s). This is meant to be a quick summary of some lessons learned, hopefully some readers find it useful.

Once you know the problem you want to solve and the customer you want to solve it’s tempting to start building product. Don’t, you’re not ready yet.

Start with a manual analysis.

Before you build any infrastructure, define any schemas or even design an interface you need to de-risk your investment. At the core of what you're building sits some data so make sure you know:

What data is available to you: There’s no point in building a product around data you don’t have. Eliminate this risk by sourcing your data before you start designing. Great chefs start their meals at the market to see what is available and what is fresh — you’re cooking with data and should pursue the same policy.

How you'll get the data: I've been involved in a number of projects where a data set looked available, but due to legal or technical issues accessing it with the query the patterns you want became a huge headache. If you don't own the dataset do some diligence on the licensing and technical capabilities of the provider. If you do own the dataset figure out how you want to query it before you build a datastore and define a schema. Prototype your product in a scrappy way with ad-hoc scripts and validate with your customer that way first.

How clean and complete your data is: Almost every project I've worked on has featured some amazing dataset that purports to be accurate. It never is. There’s always problems. Sometimes the problems are missing values, wrong values, totally ambiguous schemas, you name it. Try to run into these problems as early as possible since it’s cheap to change a hacky script, expensive to rebuild a product and devastating to ship something backed by faulty data. This will help you avoid investing too deeply in a flawed dataset and/or will allow you to design around the shortcomings of the data you have.

What the data can actually tell you: After you've worked through the shortcomings of the data you have is there still a story worth telling? Does it still solve your customer problem?

Design with real data

When you create interface mockups to show to potential customers or stakeholders use real data. Dummy data is incredibly dangerous in the design phase of data driven product. You'll end up designing an amazing product you can never deliver. You'll also instantly lose the interest of any customer you try to validate results from. Validate using real data that tell real stories so your customers can tell you how will they value the stories you're telling.

Keep the visuals simple

I'm a datavis geek: I love finding new exciting ways to display data to people. Unfortunately there have been times where I've let my zeal over visualization hurt products I've worked on. The most important thing is to keep the representation of the data simple in the finished product so your results are easily consumed by the customer who is almost certainly not a datavis geek. Visualizations that are 3d or involve showing too many segments of data at once are bound to confuse the viewer.

Representations such as streamgraphs or wordclouds are tempting to use because they seem simple on the surface but in reality it’s incredibly hard to extract meaningful insights from them.

Get to the point quickly

Focus on delivering value

Don't make your users do a bunch of work to sift through the data to find conclusions. It’s tempting to create things that are open ended and allow for unbounded exploration. This is dangerous. Users are impatient busy people who are using your product to solve a real need — help them get to the finish line quickly. My friend David once described this problem as the “grey screen of death” where a user logs into an open ended system, has a blank space where they could do anything and ends up doing nothing.

As much as possible give users conclusions as soon as they drop into your interface. If you need input from them make it as easy and guided as possible.

Even if you're planning to create powerful open ended features for power users you need to make sure there is a guided on-ramp.

Give them something to walk away with

Your users probably want to look at your data driven product to make some decision or take some action. Figure out what your users want to do with the conclusion and make it easy to leave with something tangible. Perhaps you're showing them economic data that will end up in a presentation? make sure there’s a chart export. You've got a tool to help classify medical scans? think about how you'll integrate with EMR or billing systems. Helping advertisers scan across sets of users based upon behavior to better understand their audience? Integrate with ad targeting systems so these results can be applied quickly.

When interviewing your customers figure out where you sit in the lifecycle of their work and what other tools they use.

In summary:

  1. Know what customer problem you're solving

Thanks to @jessesmith, @chanian and @mbe for providing feedback on a draft of this post.

Written by

Now: Engineering and Product at @stripe. Earlier: Founder/CEO @luckysort (acq by Twitter).

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store