Baselines in Machine Learning: Navigating Complexity Through Simplicity

Amogh Mahapatra
4 min readOct 21, 2023

--

Fake Reviews. House Prices. Humility

In one of our previous expositions, we discussed the philosophy of the bias-variance dilemma and its significant impact on practical decision-making in machine learning. Today, we’ll explore how to leverage this fundamental dilemma to our advantage.

Two bright young women passionately believed in market distribution’s role in upholding societal equitability. Consequently, they embarked on a mission to detect fraudulent product reviews on online e-commerce websites. Fueled by the large language model revolution, they built a 1TB model to identify these deceitful reviews. Their model seemingly performed impressively, identifying over 90% of the fraudulent data with an error rate below 5%. Then, one of them, the more skeptic kind, asked herself: Is this model doing any better than simply memorizing the most common phrases used in fraudulent reviews, such as “10 things that make you attractive”, “Save Save Save more than before” etc. Is this model capable of capturing linguistic nuance? A simple test revealed its limitations when it mistakenly classified the phrase, “I am Sparta, and now you know my name.” as an authentic product critique, simply because it deviated from standard review language.

This scenario highlights a pivotal dilemma for every practitioner: Which model class should one choose? At its core, this conundrum underscores the balance between overly simplistic (and thus biased) models and overly intricate ones that risk overfitting. In today’s world, you will almost always choose an extremely complex model. Yet, there is a lot of practical wisdom in building a simple baseline model (like the one described above about matching basic word frequencies) on the side. A baseline model, by nature, introduces us directly to this tradeoff and understanding its performance offers us a mathematical anchor to navigate the choppy waters of complexity.

Deciphering the Bias-Variance Tradeoff with Baselines

Bias and variance are two sources of errors in models. Mathematically:

  • Total Error = Bias² + Variance + Irreducible Error

A model exhibiting high bias often oversimplifies the data, leading to consistent inaccuracies in predictions. Conversely, high variance emerges from over-complicated models that closely follow every data point, including noise. These models might excel with training data but falter with unfamiliar data.

This is where baseline models come in. Most baselines, by their simplistic nature, will offer a straightforward view of the problem at hand.

The Baseline Journey

Navigating machine learning without a baseline is like setting out on a voyage without a compass. This starting point becomes our reference:

  • Sanity anchor: As sailors rely on the compass, machine learning practitioners turn to the baseline as a sanity check. If our intricate models drift astray, this anchor warns us to recheck our course. By understanding its metrics and tradeoffs, we can judge if venturing into the dense forests of complexity is worth our while.
  • Deeper understanding: For instance, say I have a stock price prediction model. It is 80% accurate but is it simply predicting the moving average? Or is it simply predicting the last day’s price? Or how does it do on a day of chaos? Building lots of simple stock prediction models could help understand what is truly going on with the complex model’s performance.

The Art of Designing Good Baselines

The baseline isn’t simply the “least” complicated model. It is rather the common sensical low cost things one could do, to move the needle.

Here I cover a few examples of choosing good baselines:

  • Regression Context: We could have an extremely complex regression model to predict house prices. It is hence worth investigating: is our model simply predicting the average or median of all house prices in the town, is it doing better than a simple regression model, is it simply averaging the houses in the neighborhood without much nuance, or is it predicting a single value for an entire low density area code?
  • Classification Setting: When dealing with sparse classification problems where the majority class massively dominates the minority class, such as determining email types (spam or not), predicting all emails as the not-spam will yield an error equivalent to the proportion of the minority class. An intelligent model should surpass this minimum threshold to be taken seriously.
  • Reinforcement Learning Context: In the context of reward shaping, we know that large intelligent agents like ChatGPT utilize extremely complex reward model architectures, yet it is worth comparing them against say a simple neural network, just as a yardstick to understand the risk reward tradeoffs.

Ever heard the saying, “You have to crawl before you can walk”? That’s what baseline models teach us. Whether we’re learning a new skill, embarking on a new journey, or diving into an intricate project, establishing a starting point or “baseline” can be transformative. It grounds us, provides direction, and keeps expectations realistic.

Amidst the ever-evolving complexities, there’s wisdom in simplicity. Embracing ourselves, understanding our foundational strengths and weaknesses, and navigating the world with an awareness of our baseline reality can lead to more informed, grounded decisions.

So, the next time you hear “baseline,” remember it’s not just another tech buzzword, but a philosophy worth embracing.

--

--