4 Ways That Your Accurate Model May Not Be Good Enough

Kirk Borne
5 min readSep 14, 2021

--

Source: https://xkcd.com/1838/

When we were in school and were given a problem to solve, we usually stopped working on the problem as soon as we found the answer and we recorded that answer on our paper. This might be a fair approach for elementary school assignments, but that approach is not good in higher education or in life. Unfortunately, many people continue this learned behavior into adulthood, at the university and/or on their jobs. Consequently, these people miss new opportunities for learning, discovery, recognition, and advancement.

In data science, we are trained to keep searching (at least, I hope that this is true for all of us) even after we find that first model from our data that appears to answer our business question accurately. Data Scientists should continue searching for a better solution, for at least four reasons, described below.

In this discussion please note that I am not advocating for “analysis paralysis”, where never-ending searches for new and better solutions are just an excuse (or a behavior pattern) that prevents one from making a final decision. Good leaders know when an answer is “good enough”. I learned this lesson early in my career, as I discussed in a previous article: “Top 10 Things I Did Wrong in My Career.”

Here are four reasons why the result of your analytics modeling might be correct (according to some accuracy metric), but it might not be the right answer:

1. Underfit

Your model may be underfit, due to stopping too soon in modeling the data set. Unless you are awesomely lucky, it is rare that your very first analytics model on a data set will be the best possible and most accurate model.

2. Overfit

Your model may be overfit, due to over-zealous emphasis on modeling the training data as accurately as possible. Even a broken clock is right twice a day! The analytics model must be validated against an independent validation data set (test sample). This validation process must continue with each new improvement and iteration of your model. You can only demonstrate that you have found a minimum in the model’s MSE (mean squared error; or some other accuracy metric) if you produce “improved” models (with lower error on the training data) that have higher MSE on the independent test sample. In other words, until you have discovered this turn-around (local minimum) in the test sample error curve for your sequence of models, then you should keep searching your model space for improvements.

Source for graphic: http://scott.fortmann-roe.com/docs/BiasVariance.html

3. Bias

Your model may be biased. It may be “good enough” in some cases for you to find the local minimum in the MSE curve for the model that you are building, but did you find the global minimum? Can you prove it? It is one thing to build the model right, but quite another thing to build the right model. Incorrectly identifying a local minimum as the global minimum might be manifest if you use different data variables in the model’s feature set, or you use a different algorithm, and you discover further improvements in the predictive accuracy of your analytics model. There may be human, algorithmic, or technical bias built into your modeling process that overlooks these alternative model choices. One of the most common of these biases is confirmation bias. Using an evolutionary (Genetic Algorithm) approach is one way to avoid local minima and to improve your chances of finding the global minimum in your MSE curve.

Genetic Algorithms and 3 Genetic Operators (graphic by me)

4. False Positive Paradox

Your model may be suffering from the false positive paradox. When the error rate of the model (for positive instances of the condition being tested) is greater than the rate of occurrence of positive instances in the data set, then false positives will outnumber true positives in the predictive model outputs. In this case, the paradox can be stated in this way: “the majority of instances that have been identified as having the condition will in fact not have the condition.” This can be quite serious if the condition is a cancer diagnosis for a cancer-free patient or a terrorist-related arrest of an innocent airline passenger.

Because much has already been written about the false positive paradox and its appearance in unbalanced data sets (which are datasets that have far more instances of the control class relative to the tested class), we won’t delve into those issues here. However, we mention the false positive paradox here for another reason. If your analytics model only tests for the rare class, you may too quickly find that the model has very few false negatives — i.e., a high percentage of the instances that have the rare condition are correctly labeled by the model as “having the rare condition” (true positives); compared to a small rate of such instances being incorrectly labeled by the model as “not having the rare condition” (false negatives)). In this case, if you base your model accuracy estimation (and premature analytics modeling termination) on the relatively low “false negative” rate (which might be appropriate in some circumstances), then you may miss the existence of many false positives and the presence of the false positive paradox in your analytics experiment. If appropriate, in such cases, you should try using a more complete accuracy estimator, such as a ROC curve or the F-score.

Remember: we will most likely encounter the False Positive Paradox in our models when the false positive error rate is greater than the true rate of occurrence of the condition being tested in our experimental sample.

The False Positive Paradox (graphic by me)

Summary

To avoid premature analytics model termination, we need to verify: (1) that our validation data set (test sample) is truly independent; (2) that another model (or even, an ensemble of models) converges (or not) to the same solution; (3) that we have explored the model’s complex error space for either the global minimum or the best minimum (within time and resource constraints of our project); and (4) that we know when to stop because the model is “good enough”, to avoid analysis paralysis. The 80–20 rule applies to the last two points: approximately “80% of the value is achieved with the first 20% of effort, and the other 20% of the value is achieved with 4X additional effort.”

Always, in order to validate our initially accurate model or to find a better model, we need to look beyond that initial model and dig a little deeper into our piles of data, machine learning, and linear algebra.

Follow me on Twitter at @KirkDBorne

Learn more about my freelance consulting / training business: Data Leadership Group LLC

See what we are doing at AI startup DataPrime.ai

--

--

Kirk Borne

Kirk is Advisor & Chief Science Officer at AI startup DataPrime, and founder & owner of Data Leadership Group LLC: provides speaking, training, consulting, more