Part 5- Misclassification and Data Bias

By Emanuel Țundrea, Ph.D. in Software Engineering, Emanuel University of Oradea, 12th November, 2020
Initially published under proceedings from International Technology, Education and Development Conference at

Without a doubt, we can assert that the last American elections set global news viewership at a new high. For at least one week, the news about elections in the USA topped all other topics, including the Covid-19 pandemic. Prior elections, there was no media Report without at least one public-opinion poll presentation. However, after the elections, almost every news portal rushed to explain why the polls performed poorly.


Unfortunately, this is not happening only in politics. Medical agencies and country local decision-makers who need models of Covid-19’s spread to set public policies, such as social distancing or shelter-in-place mandates, live in confusion as there is so much inherent disagreement between models. Why? Misclassification and data bias seem to be the keywords today.

Now, let’s get back to our topic. Unfortunately, the higher-ed digitalization process is not exempted from this trap and it is our duty to secure ethical behavior. The first post of this series mentioned that AI engines require masses of data to train them, but then the challenge is to assess the data. How do we know that it is representative data? Researchers admit that in many real-life scenarios this is very hard to assert.¹ ² Information Resources Management Association (USA) is recognizing that “for high dimensional data containing big-data components, it is almost impossible to find out all the informative and relevant features from the raw data”.³

Even if a school community has done its best job to create a database that can be considered balanced at one moment in time, how do we know that we are not going to meet an outlier that we have not met before? Should we allow this pattern matcher to make the final decision for this new scenario?

This challenge causes new concerns regarding the ethics of AI which is known as the data bias. Data bias comes from “the context, purpose, availability of adequate training and test data, optimization method used as well as from trade-offs between speed, accuracy, overfitting and overgeneralizing. […] Thus, the assumption of machine learning being free of bias is a false one, bias being a fundamental property of inductive learning systems”.

Data bias may happen in both directions: the fear that a student qualifies unfairly or that someone is discriminated against based on a machine’s error. Whose responsibility is if the AI algorithm misclassifies a student? How much should a university rely on such an AI assistant?

This problem is not new, but it became augmented due to the proliferation of AI agents today. Back in 1987, the British Medical Journal has been transparent and acknowledged that it found discrimination against women and members of ethnic minorities at a medical school in London.⁴ Misclassification has long been suspected, but such incidents lower the trust in AI even more.


Observing that AI agents are becoming more ubiquitous in automated decision making, these systems must be built with conscientiousness to the type of bias that can result. This can be alleviated by what I call intentional transparency so that the beneficiaries of these systems trust them to produce unbiased results. Intentional transparency means both transparency at the level of the machine-learning algorithm and also providing ways to backtrack the stochastic engines and trace the processes that led the model to reach a misclassification. This is important to avoid even my own biases to make their way into the AI systems I’m building. In many instances, the process behind developing the models we work with seems a little bit like the Wizard of Oz.
It is of primary importance to be able to open the AI black box and provide a clear explanation of how its engine works and how it generates the output.

Besides bigger investments to gather more relevant data, transparency about the algorithms, and also attracting more talent to grow the community especially from non-profit organizations and open-source projects, an independent audit both on the data and the machine-learning algorithm are among the recommended practices from big tech companies to mitigate bias. Building systems that have the “human-in-the-loop” to make recommendations or provide options that they double-check or can choose from, strengthens the confidence in its decisions and also provides an important byproduct — help against moral deskilling of the human deciders.

Let us built the new AI capabilities with ethics in mind so that it mitigates the legitimate fears and secures new systems that are a blessing for one of the most influential institutions in society: the university.

Food for thought:

What are your core values — moral, spiritual, ethical, personal, and career values?

  • What are two of the worst pitfalls you want to avoid as an educator managing data about your students?
  • What are some key power issues you are willing to give up and thus ensure that your work is transparent?
  • What type of data biases do you need to carefully address in your world?
  • Are you intentional in being accountable to a mentor and in growing the integrity of your peers and your AI system users?

[1] Mrutyunjaya Panda, Ajith Abraham, Aboul Ella Hassanien, Big Data Analytics — A Social Network Approach, Boca Raton. FL: Taylor & Francis, 2019

[2] Bertrand Clarke, Ernest Fokoue, Hao Helen Zhang, Principles and Theory for Data Mining and Machine Learning, New York, NY: Springer, 2009.

[3]. Information Resources Management Association, Deep Learning and Neural Networks: Concepts, Methodologies, Tools, and Applications, Hershey, PA: IGI Global, 2019, p. 655.

[4] S. Bhate, Prejudice Against Doctors and Students from Ethnic Minorities, British Medical Journal, vol. 294, nr. 6575, p. 838, 28 March 1987



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
LiftUp Solutions

LiftUp Solutions


LiftUp is an IT consulting company that delivers custom-made online solutions for companies that want to grow and have success in the digital era.