TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Please Make This AI Less Accurate

Demystifying the term “accuracy” in Data Science and Artificial Intelligence

Kate Minogue
TDS Archive
Published in
7 min readApr 16, 2024

--

Accuracy is one of those words that everyone intuitively assumes they understand and that most people believe is always better when it is higher.

With the rise in attention on Artificial Intelligence (AI) and the increasing awareness of lapses in reliability or accuracy of outputs, it is important for more people to understand that data products, such as AI, don’t follow the same rules of consistency or accuracy of other technologies.

The Confusion Matrix

To illustrate, let me introduce the concept of a “Confusion Matrix”. This will be very familiar to any Data Scientists that have built predictive models for classification purposes. It may be new to others but I find that the concept, the methodology and the human/business interaction involved are a useful case study to understand accuracy terminology in machine learning more broadly. It is a helpful visual tool to understand both nuance and trade-offs in these terms.

Confusion Matrix template by the author

When we speak about total accuracy we mean the amount of correct predictions (the sum of the green boxes above) out of all total predictions (the sum of the four boxes above). So this is where you may here terms like “Our pregnancy test is 99% accurate”. It is talking about accuracy of all test predictions both those that say the user is and is not pregnant.

The nuance appears when you seek to understand in which of the two remaining red boxes that “inaccurate” percentage sits in.

For rare events, you could achieve a very high accuracy by predicting that the event never happens (no model required). However, for different models and use cases the cost or risk associated with inaccuracy is not equal or consistent.

Put plainly, a lower accuracy model may intentionally be that way because you want to reduce how often you mis-predict in one direction or another. In doing this you have to choose to compromise overall model accuracy.

Is it more risky to predict (or classify) that someone is pregnant and to be wrong or the other way around?

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Kate Minogue
Kate Minogue

Written by Kate Minogue

Passionate about helping businesses thrive through impactful strategy, effective use of data and a people first mentality. Mental Health and Equality advocate.

Responses (1)