Inductive Bias In Machine Learning

Sanjithkumar
6 min readFeb 5, 2024

--

Photo by Casey Horner on Unsplash

In the intricate realm of machine learning, the concept of inductive bias serves as a fundamental pillar, shaping the very essence of how models interpret and generalize from data. At its core, inductive bias refers to the set of assumptions, constraints, or prior knowledge encoded into a learning algorithm, guiding it to favor certain hypotheses over others. This crucial aspect plays a pivotal role in the model’s ability to make predictions, handle uncertainty, and adapt to diverse datasets.

To comprehend the significance of inductive bias, one must first grasp the inherent challenges that machine learning algorithms face when presented with vast and complex datasets. In the absence of any guiding principles, models might struggle to discern patterns, leading to overfitting or underfitting issues. Overfitting occurs when a model learns the training data too well, capturing noise and anomalies, but failing to generalize effectively to new, unseen data. On the contrary, underfitting transpires when a model is too simplistic, lacking the capacity to capture the underlying patterns in the data.

Inductive bias acts as a guiding light in the darkness of uncertainty, steering machine learning models away from the pitfalls of overfitting and underfitting. It serves as a set of preferences or predispositions that allow the algorithm to make assumptions about the nature of the data and, in turn, make more informed predictions. These biases can be explicit, such as pre-defined rules or constraints, or implicit, arising from the architectural choices and hyperparameters of the learning algorithm.

Before going into any detail, You might need to have a basic understanding os similar concepts such as Restrictive and Preference Bias

Restrictive and Preference Bias:

1. Restrictive Bias:

  • Restrictive bias refers to the limitations or constraints imposed on a machine learning model, guiding it to prefer certain hypotheses over others based on the model’s architecture or structure.
  • Consider a linear regression model with the restrictive bias that only allows linear relationships between input features. In this case, the hypothesis space is restricted to linear functions, and the model will favor linear patterns in the data.

2. Preference Bias:

  • Preference bias involves favoring certain hypotheses over others based on simplicity, interpretability, or prior probabilities, regardless of the model’s architecture.
  • Imagine a decision tree model with a preference bias for shorter and simpler trees. This bias encourages the model to choose decision boundaries that involve fewer splits and are easier to interpret.

Relationship to Inductive Bias:

  • Interplay between Restrictive and Preference Bias: In practice, restrictive and preference biases often coexist within a machine learning model. The structural constraints of a model, such as its architecture and choice of features, contribute to restrictive bias, guiding the learning process. Simultaneously, preference bias influences the model’s choices among hypotheses within the allowed space based on simplicity or other criteria.
  • Balancing Act: The relationship between these biases is a delicate balance. Restrictive bias provides a foundation by limiting the hypothesis space, and preference bias acts as a guiding force within that space. For instance, a convolutional neural network (CNN) might have a structural bias for spatial relationships (restrictive bias) and a preference bias for simpler features that contribute to better generalization.
  • Impact on Generalization: Together, these biases significantly influence the model’s generalization capabilities. The right combination can lead to models that are not only effective on training data but also generalize well to new, unseen data. Striking the appropriate balance is crucial for avoiding overfitting or underfitting scenarios and ensuring the model’s robustness.

Now to understand it more clearly, it is possible to make a dozen of hypotheses based on a few observation — this is an important property of inductive reasoning: valid observation may lead to different hypotheses and some of them can be false. For example, observing from the earth you can generally assume that all stars are white just by looking at the sky, even though it’s not true… it is perfectly reasonable to assume it that way.

for this example, one could consider the simplest hypothesis that “there are stars in the sky” rather than complicating things over the observation. I know all this sounds vague… but what does this has to do with a Machine Learning.

In most machine learning tasks, we deal with some subset of observations (samples) and our goal is to create a generalization based on them. We also want our generalization to be valid for new unseen data. In other words, we want to draw a general rule that works for the whole population of samples based on a limited sample subset.

So we have some set of observations and a set of hypotheses that can be induced based on observations. The set of observations is our data and the set of hypotheses are ML algorithms with all the possible parameters that can be learned from this data. Each model can describe training data but provide significantly different results on new unseen data.

credit : Gleg

As you can clearly see in the above illustration, when training different models on a fixed train data, they tend to vary when it comes to inferring the unseen data, which can lead to varied number of predictions.

There is an infinite set of hypotheses for a finite set of samples. For example, consider observations of two points of some single-variable function. It is possible to fit a single linear model and an infinite amount of periodic or polynomial functions that perfectly fit the observations. Given the data, all of that functions are valid hypotheses that perfectly align with observations, and with no additional assumptions, choosing one over another is like making a random guess.

credit : Gleg

Now let’s infer our hypothesis from the new unseen data sample X2, and it turns out that most of the complicated functions are inaccurate. However, the linear function appears to be quite accurate, which may seem already familiar to you from a bias-variance tradeoff perspective.

credit : Gleg

The prioritization of some hypotheses (restriction of hypothesis space) is an inductive bias. So the model is biased toward some group of hypotheses(preference bias). For the previous example, one can choose a linear model based on some prior knowledge about data and thus prioritize linear generalization.

Why Is Inductive Bias Important?

As one can see from the previous example, choosing the right induction bias of the model leads to better generalization, especially in a low data setting. The less training data we have, the stronger inductive bias should be to help the model to generalize well. But in a rich data setting, it may be preferable to avoid any induction bias to let the model be less constrained and search through the hypothesis space freely.

credit : Gleg

In a low data setting, right inductive bias may help to find good optimum, but in a rich data setting, it may lead to constrains that harm generalization

How do we choose a model given the task at hand? Usually, the answer is something like this: use CNN for images, use RNN for sequential data, etc. At the same time, it is possible to use RNN for images, CNN for sequential data, etc. The reason to prefer first is the inductive bias of models that are suitable for data. Choosing a model with the right bias boosts chances of finding a better generalization with less data, and that is always desirable. It may be tempting to think that there is some optimal bias that always helps the model to generalize well, but it is impossible to find such a bias according to the “no free launch” theorem. That is why, for every particular problem, we should use specific algorithms and biases.

In the next blog we will build on this topic and will look into inductive bias of various machine learning and deep learning models, till then I hope you found this usefull.

credit : Inductive Bias in CNN (gave me a great deal of understanding)

--

--

Sanjithkumar

Deep Learning || MLOPS || GenAI Enthusiast || GCP || Azure || DOM manipulation