A Method to Measure Bias Caused By Preferential Feeds

Shadow Strike
The Startup
Published in
11 min readFeb 18, 2021

Beauty is in the eye of the beholder, a true statement, but what if, you are seeing things in different light or not seeing the complete picture?

Persons with different filtered information feed have a different view of world
Persons with different filtered information feed have a different view of world

The internet age has bombarded us with information (and opinions). Today one will find almost all possible viewpoints on a particular subject available on internet, the view or opinion range from extremely positive in favour of the subject to extremely negative and also some totally orthogonal. However, there are only few platforms (website or app) that one can use to access the information available on internet, and, unfortunately all such platforms have some back-end algorithm that filter the information and provide you with limited range of views on a subject. This is what is causing the polarization to an extent that now it can sway governments.

To understand and measure how preferred information feeds, such as personalized news, can create bias behaviour of individuals who are same or similar in all aspects except for their feeds needs to be studied. There are many proposed methods for understanding the behaviour of individuals to check their inclination or deviation from certain normal, these are mostly questionnaire-based interviews with individuals. However, it is very difficult to isolate the real reason for deviation, is it because of personalized news or is it because of a documentary that they watched or is it because of an incident that happened with them in past, making such studies non-replicable and empirical. Such studies give only directional (read correlation) predictions, not cause of any flaw in method, but because it is very difficult to setup up ideal experiment conditions of conducting study on similar individuals. In contrast, other branches of science that conduct experiments, for example (in chemistry) to check how temperature affects certain reaction, a chemist can keep all other aspect of experiment same while changing just temperature and can understand the relation of temperature and the chemical reaction, can create ideal set of test samples having same or similar properties for all parameters (except the parameter that experimenter wants to vary). This problem has plagued psychology experiments since the beginning causing them to do only empirical statistical analysis.

With the advent of machine learning and artificial intelligent algorithms, specifically, neural network-based machine learning, developers have tried to mimic how biological neural systems (human brain) process data and generates inferences. Today, developers have created models based on neural network that can not only correctly determine persons in photographs but have also evolved in other areas where they are being used to generate images, texts and even music.

Neural networks-based computer learning models are now able to learn from the real-world data (data could be numbers, images, sounds, texts, etc.) and are able to accomplish tasks that were earlier thought that only humans are efficient in. Since, such models are so closely mimicking human thinking process, we might consider such models to understand (and therefore predict) what a human will do given a set of stimuli. For a simple example, a model trained to recognize animals in a picture, can be used to understand what a human will respond when shown an image with animals and asked to name the animals (which brings us very close to Turing test). We don’t want exact replica of human or human mind to study (we have been studying rats & guinea pigs for deriving psychological principals) but only human or human mind equivalence.

Neural models, if utilized, will be simple blank human mind replica that can be trained and observed in perfect lab conditions. It also provides an advantage of virtually creating as many mind replicas as required, limited only by the available computing resource. Using this to understand how preferred news is changing behaviour, we can create multiple neural models and train each of them with different version of preferred news as different human individuals would have received and read them. Once trained, the models can be studied for their differences, like word associations, etc.

To check if this method of evaluating biased-learned human-mimicking neural models can actually work, I first simplified the problem and checked how simpler models based for number recognition behave for biased environment, since, these models are easy to create and there is ample database available for training such models. A simple number recognition model learns various features, that will help it to correctly associate an image of number with actual number, by first going through numerous images of numbers which have been correctly identified (by humans) i.e., the model is trained using some training samples of handwritten number images which have been correctly labelled with correct number. Models trained like this have reached accuracy rate of 99.5+% when identifying an image of handwritten number which it has not previously seen.

The setup, for such an experiment, would require creating a neural network model that can be trained to recognize numeral in the images. A set of training images is required for the same, together with, a set of testing images, on which the trained models can be tested to check the variation in their prediction. Such an experiment might show that by preferentially varying the ratio of images for training the model, models will show a preferential prediction to certain set of numerals for which the ratios were higher. Which can be further generalized as different models trained by varying biased input information will have biased prediction while predicting on same (but different from input) set of information.

To conduct this experiment, I utilized available MNIST database and trained multiple models by varying the training dataset. MNIST database has roughly same count of images for each numeral, with each image of 28 x 28 pixels. To generate a bias, I created a subset of the image database with a reduced count of images for certain numerals. For example, to give a reduced bias to number 2, I kept all images for other numerals and reduced the images of only 2 from the original database to create a biased subset of images. For neural network model, I utilized a convolution neural network model (CNN) with a single hidden layer of 500 nodes (or artificial neurons). The model as an output provides a set of 10 probabilities, one for each number, from which most probable number is picked. All the trained models were tested on the same set of images so that the test results of each model could be compared among themselves. The model that was trained without any bias in the training data was considered “normal” while the other models with biased training were considered as “deviants”. With the setup in place, the biased training experiments can be conducted, however, to do that biased set of images need to be created. The biased training set required reduction in number of images, to reduce dependency on a particular set of images, images were deleted randomly to reach the desired number of images.

If the set of images for each numeral can either have bias or not and there are 10 different numerals, which gives 2¹⁰-1 or 1,023 different biased training set of images. Instead, if for each set three variation — no bias, medium bias, high bias — is allowed, then it will have 3¹⁰-1 or 59,048 different biased training sets. It goes further to 1,048,575 if 4 variations — no bias, low bias, medium bias, high bias — is allowed. The number of biased training set is important as it directly translate to the total time required to conduct the experiment (i.e., time to train and test a model for each of the biased training set), for a given computational capacity. For example, if it takes 5 minutes to conduct a training and testing on a model, with two variations of bias (bias or not bias), it will take 85 hours to train and test 1,023 different neural models; for 3 variations, it will take around 205 days and 4 variations will take 10 years!

To overcome this, I decided to have training sets with only two variations. However, I decided to create multiple such sets with different level of bias i.e., one set of 1,023 different biased set will either have no bias or low bias, whereas, another set of 1,023 biased set will have either no bias or high bias. I kept 5 different levels — very high, high, medium, low, very low — of biases, which gave 5 * 1,023 or 5,115 different biased training sets (which still requires 425+ hours to conduct the experiment).

As a part of experiment each of the model was trained using the biased training set was then evaluated over the same test set of numeral images on their prediction.

In all, total combination of 5,686 setups were created (including some for checking the trend and I ended up running base model in every setup). The details of the experiments and outcome is summarized in the table below.

Image variation   # of experiments   Successful   Failed  
----------------- ------------------ ------------ --------
various 566 515 51
1% 1024 563 461
10% 1024 935 89
25% 1024 987 37
50% 1024 952 72
75% 1024 931 93
Total 5686 4883 803

Now, to interpret the results from the experiments. For each model, its accuracy and predictions were recorded for each numeral’s image. The predictions of the model were then compared with the base model (a model trained using all numeral images — without any bias). Based on this the deviation for a model was calculated based on the following formula:

where D is deviation and N is number of images

This gave 10 set of numbers for each model, each number corresponding to deviation for a certain numeral from the base predictions. This is same as saying that, deviation of each model from the base can be described by a 10-dimensional vector. Now, once we have a vector for each model, they can be compared among each other easily by finding the overall deviation score as:

On analyzing the results, interestingly, from the various model that were created to check the trend, following pattern came out for their accuracy. It is clear that below a certain threshold (20% in this case, which means that bias is 80%) the model’s prediction starts deviating a lot. This is important to note as this will help us later to define the method for measuring the bias.

Prediction accuracy of models as the % of images of numerals from the normal in training
Prediction accuracy when 2 numeral images are varied

Since, it is established that measurable deviations happen when the bias is more than 80%. Therefore, it is worth see various models that were trained with 90% bias. Before jumping to that, we will have to visualize a 10-D vector in 2-D (not 3-D since it will be displayed on screen, which is a 2-D surface). To make it easier to visualize, think of the vector as the handle of an umbrella, now all (say 10) spokes of the umbrella are almost at same point when the umbrella is closed, now when you open the umbrella, each spoke can be thought as a dimension (single dimension vector) and its shadow on the ground is 2-D projection. If similar idea is applied to the vector associated with each model, we would find something like this:

“normal” model

This is how the base model would look like in our visualization (notice how it looks like the shadow of an open umbrella!). To comprehend this, the red dotted line is how a “normal” prediction will look like, the green area is the actual prediction. The cross marks indicate the bias in training (which in based model was none)

Now these are some of the biased models visualized.

various biased models

The deviation from the normal is quite visible in biased models. You can find the visualization of all models here.

All models

Some of the interesting finding from these:

Though it is obvious, but even though, if bias is introduced in more numerals, the magnitude of deviation from the normal is higher. However, interestingly deviation from normal reduces once the bias is present in more than seven numerals. This is happening because a negative bias in 9 numerals is similar to positive bias in 1 numeral, it is similar and not same because the count of images in training set is highly reduced (due to negative bias) and therefore, the overall accuracy of the model is low, giving rise to higher deviation. (Comparing this with real life, we can deduce that ignorant group is far better than a biased group!)

How the models deviate from the normal model

Also, if you notice like in image below, the bias in a particular set of numerals (here bias was present in 2, 3, 4 and 7), doesn’t only causes bias in the prediction of that numeral but if also affects prediction of other numerals (the bias in prediction is not only in 2, 3, 4 and 7 but also in 6, 8 and 9), exactly what should happen in a real-world scenario. Which is why neural network models are extremely helpful because they are able to learn hidden relations and hence can predict working of human mind more efficiently.

Biased model, showing the bias in more numerals than in training

Now, all of this gives a good framework for how we can utilize the neural models and train them to predict the bias

1) Find the threshold value which will start creating bias in learning

2) If the bias in information is higher than the threshold value, train a model with the biased information

3) Compare the prediction of the biased model with the prediction given by “normal” base model on a set of dimensions

4) The measured difference is the bias due to biased information

What we have established here is that we can utilize neural models to depict human learning and can train them with different biased information and can measure the bias levels resulting from that information. This when applied to the information feeds that we receive from the internet will help us determine the magnitude of bias that is present in us!

Next, I have used the above framework and applied them to find the bias generated due to reading 8 years of news articles of two leading newspapers. Please check out my next post soon for that.

--

--