Seismic’s Guiding Principles for Data Science

Colin Jemmott
Seismic Innovation Labs
4 min readJul 16, 2018

As the data science team at Seismic has grown, it became important to articulate some of the principles that we share. Feedback is welcome.

Honesty

With each other: Robustly and openly review others’ work, and have your work reviewed by them. This helps us to ship the best product, but also to learn from each other and improve. It is important both to exercise independent judgement and to communicate that feedback in a way that it is carefully considered.

With yourself: We all have limits and make mistakes, and it is important to understand them. There is no shame in acknowledging your own ignorance, but shipping algorithms or code you don’t understand can be disastrous. Remember to take ownership of both your failures and your successes.

With users: Consumers of data science products are making data-driven decisions. If a user is mislead, they may make important business or life decisions that are based on falsehoods, which can quickly break trust that you may not be able to recover. To maintain this: never knowingly ship bad data or analysis, acknowledge and quickly fix mistakes that are reported, and check in with users to make sure they actually understand what is being presented.

Curiosity

About data: Good analysis depends on understanding your data. Not just knowing what the data set looks like, but deeply understanding how it reflects what happened in the real world. And when something seems off in the data, have the tenacity to really dig in and figure out what is going on. Good understanding of data beats good algorithms every time.

Use science: Good data science should be, ummm, science! Controlled experiments are how we know what works and how to do better. Form hypotheses, design a test, and measure results. New data should be able to change your mind, even about strongly held beliefs.

Self-improvement: Your technical skills are the only thing holding you back.At Seismic you have the data, resources, and support to build and ship world-class data visualizations, natural language processing, machine learning and more. Your ability to ship good code is the limiting factor, and continuous learning is the way to overcome that.

Impact

How to tell: At Seismic, data science is about helping our customers do their jobs efficiently. A good metric to use is that if people don’t change their behavior, you didn’t make an impact! Usually this means shipping solutions where the consumer cares (often this is related to money) and has control (the ability to change the situation). Enough of either one is sometimes enough, but if your solution is not something the customer cares about or can control, it won’t have an impact.

Get unstuck: Our problems are almost always ill posed and have more than one solution. Finding the best one means avoiding local maxima and embracing creativity. When you find yourself working really hard to make marginal improvements, step back and ask if there might be a radically different way to solve the problem. Reach out to the rest of the team for help.

Follow: A secret of data science at Seismic is that we generally make novel applications of existing methods instead of inventing totally novel methods. Part of the way we are able to move quickly and make a big impact is by following. Start any project with a review of the relevant literature, and try not to reinvent the wheel.

Humility

Being wrong is ok: Despite our best efforts, most of us are wrong most of the time. The trick is to find ways to validate your work, and more importantly find ways to invalidate it. Look for edge cases, and don’t be fooled by randomness. Be mindful of the difference between “analysis is free of errors” and “the results of the analysis are true”.

Easy > Hard: Science is hard. If you start by solving easy problems with simple methods you can iterate more quickly and are much less likely to make mistakes. Also remember the cost of complicated solutions isn’t just the initial research, but also drive up development time, architecture complexity, compute costs and ongoing maintenance. In general, if a simple approach doesn’t do at least an ok job, it is unlikely that a complex approach will be able to do a good job.

Dead ends: Most of your research will fail. Accepting that can be hard, but if you acknowledge it then you can plan for it. Try to identify dead ends as quickly as possible to avoid wasting time developing solutions that won’t work. Seek feedback about your work, and when you do identify a dead end, accept it with grace, document it, and try again!

--

--

Colin Jemmott
Seismic Innovation Labs

I am a data scientist at Seismic Software and lecturer in the Halıcıoğlu Data Science Institute at UC San Diego. http://www.cjemmott.com/