Curve Fitting and the Bias-Variance Trade-off in Machine Learning

Striking the Balance Between Flexibility and Interpretability

4 min readDec 4, 2023

Curve fitting, the art of constructing a hyperplane curve that best fits a series of widely varying data points, is a crucial process in data analysis. This mathematical model serves various purposes, acting as a visual aid, smoothing out noise, reducing data while preserving essential information, and facilitating data imputation. Additionally, it is valuable for summarizing relationships among variables and predicting outcomes beyond observed data.

Uses of Fitted Curves:

Data Visualization: Fitted curves aid in visualizing the general trend of real-world observations.
Data Smoothing: The predicted curve’s values help smooth out original data points, eliminating noise.
Data Reduction: Storing only the function allows for the reconstruction of the entire dataset.
Imputation: Fitted curves help infer values where data is missing through statistical interpolation.
Extrapolation: Beyond the observed data range, fitted curves enable prediction or forecasting.
Outlier Detection: Deviations from the fitted curve highlight potential outliers in the data.

Over-fitting and Under-fitting:

In mathematical modeling, overfitting occurs when the model corresponds too closely to a specific set of data, failing to predict additional data reliably (memorizing noise rather than learning to generalize, resulting in low bias and high variance). This complex model contains more parameters than justified by the data (“curse of dimensionality”), capturing noise rather than generalizing trends. Conversely, underfitting happens when a model cannot adequately capture the underlying structure of the data, leading to poor predictive performance (displays simplicity with high bias and low variance).

Bias-Variance visualised in a diagram — Bias-Variance visualized

The Prediction Errors:

Bias Error: The difference between the model’s average prediction and the actual value being predicted.
Variance Error: The sensitivity of the model’s prediction to small fluctuations in the training set.

Different Combinations of Bias-Variance:

Low-Bias, High-Variance (Overfitting): Inconsistent predictions that are accurate on average.
High-Bias, Low-Variance (Underfitting): Consistent but inaccurate predictions on average.
Low-Bias, Low-Variance: An ideal model but challenging to achieve due to the bias-variance trade-off.
High-Bias, High-Variance: Inconsistent and inaccurate predictions, reflecting an undesirable model.

Bias Variance in a picture — Bias Variance explained in a diagram: The green patch is the observed real-world target data. The red patch is the prediction made by the ML model.

Bias-Variance Trade-off

Achieving a balance between bias and variance is crucial when building a machine learning model. This optimal balance, known as the bias-variance trade-off, requires finding a sweet spot to avoid ‘overfitting’ or ‘underfitting’. Techniques such as cross-validation, regularization, and ensemble methods contribute to achieving this delicate equilibrium.

The Interpretability and Flexibility Trade-offs:

Linear models offer simplicity and interpretability but may underfit, while non-linear models offer complexity and provide flexibility but may overfit. Finding the sweet spot between interpretability and flexibility is essential to create a well-balanced model.

Linear models are simple, easily interpretable & less flexible (tend to underfit). Non-linear models are complex black-box, less interpretable & highly flexible (tend to overfit) — The trade-offs between the Interpretability and Flexibility of different models

In essence, ‘curve fitting’ is about navigating the delicate balance between flexibility and interpretability to create models that capture underlying patterns without being swayed by noise. The bias-variance trade-off serves as a guiding principle in this pursuit, emphasizing the need for models that generalize well while avoiding over-complexity.

Glossary of some Key Terms used in this blog:

Hyperplane: A subspace whose dimension is one less than that of the n-dimensional feature space.
Imputation: The process of replacing missing data with substituted values through statistical estimation.
Extrapolation: Estimating beyond the original observation range based on the relationship with another variable.