Your Bias or Mine?

Ajinkya Bhanudas
Analytics Vidhya
Published in
3 min readSep 18, 2019

The extent to which we get influenced has a lot to do with how it affects our decisions and judgments. Let’s look at what this bias does in the case of analytics and machine learning.

  1. Confirmation bias: When performing the analysis of data they want to prove a predetermined assumption. The search doesn’t stop until this assumption can be proven. This is too an extent where the data gets unknowingly tweaked. Inclusion or exclusion of certain variables. The best possible way would be to perform tests on a set of focused hypotheses.
  2. Selection bias: The sample in consideration is not a good reflection of the population. The data used to train the algorithm over-represents one population, making it operate better for them at the expense of others. This error is largely made in surveys. It is due to the subjective selection of the samples. A customer would particularly want to be a part of a survey because they like the product the survey is about. This has serious implications on the results generated by the survey. Because the results could be misleading. There are various sampling techniques that allow us to get rid of such kind of bias to a great extent. Another point to note is avoiding the false extrapolation of data which could lead to incorrect generalizations for certain segments of the population. An example could be an image recognition system only being trained on a subset of a population (only a particular breed of dogs).
  3. Confounding variables and the bias they bring in: A confounding variable is a variable that is outside the framework/scope of the existing analytical model but has an influence both on the explanatory and the dependent variable. If confounding variables are not considered, the result would be an implication that there is a cause-effect relationship. Which is not true. An important fact to note is that correlation does not imply causation. A/B tests could be a good way to go about validating such assumptions.
  4. Interaction bias: The training process biases an algorithm by the way we interact with it. As an example, Google asked users to draw a shoe. Users drew a man’s shoe, so the system didn’t know that high heels were also shoes. This is quite disastrous in a production scenario. A bias of this type doesn’t quite come to light unless put to test like in such situations.
  5. Latent bias: The idea of incorrect correlation by the algorithm on different features and corresponding results. This is similar to correlating a mens hair-stylist to be a man. Which might be untrue only because the opposite hasn’t been observed.
  6. The Bandwagon Effect: It is a psychological phenomenon in which people do something primarily because other people are doing it without any regard to their own benefits. The bandwagon effect has wide implications but is commonly seen in consumer behavior. The tendency to follow trends and fads occurs because people gain information from others and desire to conform. This would lead to undesirable assumptions and methodologies being adopted. This is almost like leading the data towards an output. This can also be related with “herd behavior”.

There are many other forms of biases (e.g. cognitive bias) but the above mentioned are the ones that you would come across for the most of it in machine learning.

Photo by davisco on Unsplash

--

--

Ajinkya Bhanudas
Analytics Vidhya

AWS Associate Cloud Solutions Architect |Applied Data Science and Machine Learning