How to think like a data scientist

Adrian Cypcar
AzimoLabs
Published in
6 min readDec 17, 2019

So, you want to do interesting things with data? That’s great, good choice. But before you run off and start building a machine learning model, there are some things you should know.

This article is about how to think like a data scientist. Practical skills are all very well, but if your thinking isn’t coherent, you will fall into a predictable set of very human traps. Your end product, even if it looks cool in a presentation, won’t be much use to anyone.

Thinking like a data scientist can help anybody at a data-driven company, whether you work in marketing, product development or finance. So, if you want to harness the power of data and avoid the mistakes that your peers are making, read on.

Critical thinking

To make the most of data, you need to master critical thinking. Critical thinking is the ability to analyse information and draw reasoned conclusions that lead to the best possible outcome.

In order to do this, we need to gather and evaluate information from as many different sources as we reasonably can. But we shouldn’t just absorb this information and assume that we are now smarter and better able to solve the problem. A good memory alone doesn’t make you smart.

Critical thinking is about creating connections between the dots. A good critical thinker can deduce consequences from pieces of information. They question ideas, assumptions and opinions rather than accepting them at face value. They identify and solve problems systematically by analysing facts rather than relying on intuition.

Critical thinkers use more than intuition

In summary, critical thinkers:

  • Understand the logical connections between ideas.
  • Identify, construct and evaluate arguments.
  • Detect inconsistencies and common mistakes in reasoning.
  • Solve problems systematically.
  • Identify the relevance and importance of ideas.
  • Reflect on the justification of their beliefs and values.

Try this exercise to assess your critical thinking:

Think about some information that you received recently and then answer the following questions:

  • Who said/wrote it?
    Is it someone you know? Someone in a position of authority? How does their authority affect your judgement? Does it matter who said/wrote it? Is that person credible?
  • What did they say/write?
    Did they give facts or opinions? Did they provide all the facts or were they selective? Is anything missing?
  • Where did they say/write it?
    Was it in public or in private? Did other people have a chance to respond and provide an alternative account?
  • When did they say/write it?
    Was it before, during or after an important event? Is timing important?
  • Why did they say/write it?
    Did they explain the reasoning behind their opinion?
  • How did they say/write it?
    Were they happy or sad, angry or indifferent? Did they write it or say it? Could you understand what was said?

Unconscious bias

Before we start analysing data, we need to take a look at our biases. Human beings like data that fits their current map of the world. Worse, they tend to disregard valid data that doesn’t.

It isn’t possible to account for all of our biases, but we must try our best. If something appeals to us because of our beliefs, political views or prior experiences, it should still be questioned.

How balanced is your judgement, really?

When we stumble upon data that we find useful to our argument, it is tempting to stop looking for further information. We must do the opposite. Charles Darwin could propose the theory of evolution only because he had spent many years trying to disprove his own ideas. This is the scientific method.

If we set out to disprove our thinking rather than merely support it, we are far more likely to find flaws in our logic. We can then use these flaws to strengthen our argument and preempt counter-arguments that might destroy our credibility in the boardroom, the laboratory of even the local bar.

Remember that “bad results” — the results that you didn’t expect — are also meaningful results.

For example, let’s say you’re running a marketing campaign to attract new customers to your product, and you want to assess whether your campaign worked.

In doing so, you find that the campaign failed on certain metrics. Other metrics, however, show limited success. You might be tempted to report only the successful data and skim over the other stuff.

It is imperative that you don’t, because bad results are valuable results. They teach us an important lesson about what not to do next time, or tell us how to tweak our formula. Knowing what doesn’t work is a key part of success.

It is a truism that we learn from failure. Nonetheless, human beings find failure so uncomfortable that we have an array of unconscious reflexes, such as denial, that prevent us from acknowledging it. If we are active and conscious about assessing our own work, we become stronger in ourselves and in the eyes of others. You can only pull the wool over people’s eyes for so long.

“Lies, damned lies and statistics”

Data can also be biased. Biased data does not show the whole picture, misses meaningful data points or comes from a sample that is not representative of the problem in question.

To counter this, make sure that you are gathering data appropriately. If, for instance, you need data about the average height of people in New York, it wouldn’t make sense to take your entire sample from local basketball teams. To get unbiased results, you would need a reasonably large, random selection of the entire population.

Selective sampling will skew your results. Cast the net wide.

Make sure to question data sources before you use them. Remember also that corporations usually present data that support their own interests. Thirty people who say they look younger after using a particular moisturiser have probably not discovered the secret to eternal youth.

Don’t run before you can walk

When you’re building a a new project, it’s OK to start small. If you need to validate early ideas, try something simple rather than reaching for advanced techniques. It’s better to get some early results and change direction than to struggle for weeks with a complex algorithm that might not work.

When comparing models that offer similar opportunities, follow Occam’s razor and choose the simplest one. You will save money and time. Also remember that more complex solutions increase long-term cost, as they have to be maintained throughout the lifecycle of a project. Work iteratively. It is better to finish something simple than not to finish anything at all.

Know the bigger picture

If somebody asks you to do some analysis, make sure you have the full context. You need to fully understand the problem that the person is trying to solve and why. If you don’t know this, you won’t be able to choose the right data sets to analyse, or build the right end product. Don’t be afraid to ask lots of questions. It will save time and disappointment in the long run.

I hope this article will help you in your approach to working with data. Feel free to ask questions in the comments below.

Towards financial services available to all

We’re working throughout the company to create faster, cheaper, and more available financial services all over the world, and here are some of the techniques that we’re utilizing. There’s still a long way ahead of us, and if you’d like to be part of that journey, check out our careers page.

--

--