How can cognitive biases impact data analysis?

Jonas Fernandes
Blog Técnico QuintoAndar
8 min readJun 25, 2020

First off, I want to register my thanks for Julliane Alberigi and Gabriela Bassa for inputs on this article, discussion, and insights. This article is the result of our teamwork. Also, thanks to Fernando Paiva for all the reviews.

At QuintoAndar we have the challenge of promoting a unique experience when renting or buying a property. We are always trying to deeply understand the user’s journey, their needs and pains encountered throughout the process, and analyzing data help us a lot in this mission. However, what data can we analyze? Just look at metrics? Or can they not be sufficient to answer and identify deeper problems?

Reflecting on these points with some people, a question arises:

How could we combine the knowledge obtained in the qualitative research developed by the Design and Product teams, with the quantitative data analysis developed by the Data Analytics team?

We aimed to further deepen the knowledge about users, and we shared the belief that qualitative and quantitative data should be treated with the same care and are complementary in the decision-making process.

We thought of some of the actions to promote this idea and identified the opportunity to spread more knowledge about good practices in the application of research methods so that we can obtain better insights from the investigations carried out.

When talking about good practices, a very interesting subject came up: cognitive bias. How can cognitive biases affect the quality of data and analysis, and how can we avoid them?

But what are cognitive biases anyway?

To get a valid knowledge of the phenomenons, we apply the scientific method to ensure that our conclusions are based on an objective understanding of the observed effects. The way we employ the scientific method as data analysts (qualitative or quantitative) doesn’t follow the ways of scientists. However, we need to be careful to obtain the best results that will guide business decisions.

Cognitive biases are like shortcuts that our brain takes to facilitate information processing and decision making. However, in these shortcuts some simplifications can lead to errors of judgment, leading to misinterpretations of the facts.

There is a very large list of biases classified into types, some of which are described as particular cases of the more general types. All of this makes it very complicated to be exhaustive about the subject in a single article. To discuss this subject we would have to introduce concepts of logic, psychology, and even epistemology, all this is off-topic here.

We will list here the most common and frequent biases from our experience. The intention is to define each of them practically, offering examples of how they can be identified and recommendations on how to avoid them.

Induction bias

Induction biases occur when we stimulate, indirectly or deliberately, the answer already in the formulation of the question. For example, when asking “What is the color of Napoleon’s white horse?”, this induces us to answer White, but what if the horse was Brown or Black?

In market research, blind tests, often used for food products, are famous. Usually, two samples of a product are presented to the interviewee, in neutral and equal packs, and the person is asked about some measures that are investigated about the product (such as criteria of sweetness and consistency, for example).

In this case, the induction biases of the brand, packaging, and product presentation are being controlled, even the order in which the interviewee tastes each sample must be observed. Otherwise, the interviewee could be induced to respond based on external factors, distorting the result of what we want to measure.

For analysis of quantitative data, the induction bias may be present in the formulation of the hypotheses to be tested, assumptions assumed, or even in the way the metrics and indicators are calculated.

To avoid inducing the analysis of quantitative data, it is recommended to understand well what is being measured and how. When selecting a metric to test a hypothesis, one must observe what is the time frame it needs to present variation, as well as its sensitivity to variation for the effect to be observed and if it is robust, that is, it presents a behavior consistent over time.

In the analysis of qualitative data, the induction bias can occur when we include some value judgment or restrict the respondent’s response. Use adjectives (such as easy or difficult, beautiful or ugly), give examples that encourage an answer, or even ask binary questions (whose answer is yes or no). In this way, when collecting qualitative data, it is always worth giving space for the interviewee to feel free to say what he thinks and explore without evaluating the answers that are offered.

Confirmation bias

We are often faced with the following situation: we argue to defend, but how do you support it? This is a trivial argumentative practice, but it can lead to serious errors in data analysis. The tendency we have to take this shortcut, looking for (only) information that proves a personal argument or hypothesis is called confirmation bias.

The confirmation bias appears when we select (consciously or not) data, information that corroborates the initial argument or even interpret ambiguous information favoring what we try to affirm. It is a shortcut that seeks to confirm personal beliefs through data, which can occur because we overestimate our beliefs.

In the analysis of quantitative data, confirmation bias can occur when we do not respect all analytical steps, we do not observe counterfactual data (information that can refute our hypotheses) or we do not carefully observe the due process of formulating and testing hypotheses before concluding.

The qualitative data can occur when the interviewee’s speech is out of context, disregarding the interviewee’s history, previous experiences, contradictions, and the analysis segment of which he is part. Another confirmation bias may arise from the order of the interviews, that is, we take hold of the interpretation of evidence of part of the interviews and start looking for the same evidence in others.

Also, to avoid confirmation bias, it is recommended to make the criteria and assumptions made for analysis clear and to distance yourself from the problem before completing the analysis. The second tip is quite simple, but it makes a difference: we are often so immersed in an investigation that we are unable to imagine other perspectives of analysis than the one we had imagined before.

Selection Bias

When selecting a sample, a set of observations for analysis, we have to ensure that it represents reality in all its plurality. To do this, we must understand what are the different segments that we need to observe and how they can interfere with the results of the study. If the sample is skewed at the time of selection, it will be impossible to make inferences for the entire population from it.

A common example is to come across a questionnaire sent by a colleague or shared on a social network asking for answers. I’m sorry to say, but this is a useless exercise and any conclusion drawn from there could (if at all) say about the sample, and never be extrapolated to the universe studied. Still, in most cases this data collection would be biased because it would represent only the immediate group of people we have contact with, implying that the social insertion of the person asking for answers (in cultural and demographic ways) already indicates a bias.

Knowing now that you cannot and should not call your friends and colleagues to answer your surveys. How to avoid selection bias?

The first step is to understand the sample size that we can cover and from which segments we want to observe the results (for example, sex, age, family income). So, let’s understand if we can cover the sample’s different stratifications with a sufficient number of cases.

To select stratifications well, we must observe, we must understand which variables of the phenomenon we want to observe can influence the sample’s behavior. If we want to study hypertension in the population, we have to consider that this phenomenon can vary together with the individual’s age. We must ensure that the sample can cover the plurality of the universe to be studied.

In the survey of qualitative data, we must take even more care with the selection of respondents. In no way should we use our relationships or personal network to recruit respondents, or even worse, the people who work with us developing the same product. In addition to the most obvious biases that this can provoke (relationship, the universe of shared knowledge, etc.), there are also broader social biases such as education, income, social origin. Therefore, always remember the maxim: “You are not your user”.

More complex studies and specific profiles may require help from professional recruitment services. Do not hesitate to ask for help from these professionals to define the sample, the difficulty of accessing some groups may even make the study impossible.

Survivorship Bias

If you got here you may be in doubt about the difference between these cognitive biases. Could a confirmation bias be understood as an induction bias or vice versa? Or even a selection bias?

In some way, the different classifications of bias can be connected, either for direct implication between them or even because one bias reinforces the other. In some cases, it can be said that some are particular cases of others.

This is the case of survival bias, a particular type of selection bias. It consists of concluding based only on particular cases, disregarding all others. Often this shortcut leads us to generalize from a single case, or even because “all the people I know” do this or that.

Therefore, the survival bias can base conclusions on a particular and subjective set of evidence, without taking a systematic sample and confronting this information with reality.

A famous example of the survival bias comes from World War II. When evaluating the planes that return to the base after attacks it is concluded that they had more damage in the parts of the chassis that showed most bullet holes and that therefore these would be the most fragile parts of the planes and needed to be reinforced. However, by disregarding the analysis of airplanes that did not return to base, it is understood that the sample was skewed.

The statistician Abraham Wald, when carrying out the analysis, took into account the survival bias and concluded that areas of the airplanes that showed most damage were precisely the most robust ones because planes with this type of damage still managed to return to base.

However, when analyzing quantitative and qualitative data, we must understand which are the outliers that can bias the sample, understand whether the data covers the entire universe, and observe the origin of the data, as well as the comparisons we make between segments (groups) of analysis.

Conclusion

After all, it is important to discuss how cognitive biases can impact the quality of your data, as consequently, they can lead to wrong decisions. In addition to considering and controlling them throughout the analysis process, we must pay special attention to the moment of collection or production of this data.

--

--