Machine Learning for Classification: Quadratic Discriminant Analysis vs Logistic Regression

Sarthak Arora
Analytics Vidhya
Published in
6 min readJul 8, 2023
Image: SOURCE

Welcome to the ultimate face-off in the realm of machine learning! In one corner, we have Quadratic Discriminant Analysis (QDA), a method that sounds like it’s straight out of a sci-fi movie.

And in the other corner, we have Logistic Regression, the tried and true champion of classification.

Get ready to witness the clash of these titans as we dive into the captivating world of machine learning for classification. Prepare to be enthralled as we unravel the strengths, weaknesses, and real-world applications of QDA and Logistic Regression. Let the battle begin!

How did I hear about this lad ‘QDA’?

While exploring the realm of Applied Machine Learning, I stumbled upon a treasure trove of knowledge called “Introduction to Statistical Learning with Applications in R” (Now in Python).

ISLP: Download it for free here

As I delved into its pages, I encountered a term that piqued my curiosity: Quadratic Discriminant Analysis (QDA). Surprisingly, QDA seemed to fly under the radar, rarely mentioned in the same breath as popular models like Decision Trees and Logistic Regression when it comes to classification tasks.

I was like, let’s see how much this chatGPT knows about this topic. And that’s when I had this awesome chat with it [mentioned at the end of this blog]. Let me share my insights about this algorithm and how it compares with Logistic Regression!

I’m assuming that we’re all aware of what are classification models and how Logistic Regression works.

What is QDA?

QDA is a captivating algorithm, that infuses the world of classification with its quadratic flair. It dances beyond linear boundaries, fearlessly embracing non-linear relationships. By modelling class-conditional distributions with quadratic functions, QDA paints a vivid picture of the intricate curves and contours that define our data.

Don’t be fooled by its elegance; QDA wields a powerful arsenal. Estimating separate covariance matrices for each class, it adapts to the unique characteristics and variability within different classes, making it a formidable ally in diverse datasets.

Quadratic Discriminant Analysis (on the right): works well with data which has varying covariances (SOURCE: sklearn)

To truly understand the magic behind Quadratic Discriminant Analysis (QDA), we must delve into its underlying assumptions. Like any method, QDA operates within a specific framework to ensure accurate and reliable results. So, let’s explore the key assumptions that shape the world of QDA.

  1. Multivariate Normality: QDA assumes that the predictor variables within each class follow a multivariate normal distribution. This means that the feature values for each class form a bell-shaped distribution in a higher-dimensional space. While this assumption may appear stringent, it allows QDA to capture the statistical properties of the data in a meaningful way.
  2. Class-Specific Covariance Matrices: QDA goes a step further by assuming that each class has its own covariance matrix. This implies that the spread and shape of the feature values can vary across different classes. By estimating separate covariance matrices, QDA accommodates the unique characteristics and variability within each class, enabling more flexible modelling.
  3. Independence of Observations: QDA assumes that the observations within each class are independent of each other. In other words, the feature values for one observation do not depend on or influence the feature values of other observations within the same class. This assumption simplifies the modelling process and ensures that each observation contributes independently to the classification.
The art of assumption: SOURCE

QDA vs Logistic regression

In the realm of classification, Quadratic Discriminant Analysis (QDA) and Logistic Regression have established themselves as prominent contenders. While QDA captivates with its elegance and flexible modelling, Logistic Regression stands tall as a tried-and-true method.

Now, let’s delve into the clash between these titans and explore how they stack up against each other, including the realm of polynomial logistic regression:

  1. Decision Boundaries:
    QDA and Logistic Regression differ in their approach to decision boundaries. QDA, with its ability to embrace non-linear relationships, can capture intricate curves and contours in the data. It models class-conditional distributions using quadratic functions, offering greater flexibility in separating classes.
    In contrast, Logistic Regression assumes a linear decision boundary, relying on a transformation of the linear predictor using the logistic function.
    However, by employing polynomial logistic regression, Logistic Regression can also incorporate non-linear decision boundaries, thus bridging the gap to some extent.
  2. Assumptions:
    While both QDA and Logistic Regression make assumptions, they differ in their nature. QDA assumes multivariate normality within each class and distinct covariance matrices for each class.
    In contrast, Logistic Regression assumes linearity in the log-odds of class probabilities. Polynomial logistic regression extends Logistic Regression to handle non-linear relationships by introducing polynomial terms, relaxing the assumption of linearity.
  3. Model Complexity and Interpretability:
    Logistic Regression, including polynomial logistic regression, generally exhibits simpler models compared to QDA. It produces linear or polynomial coefficients that directly relate to the features’ influence on the log-odds of class membership. This simplicity enhances interpretability, making it easier to understand the impact of predictors.
    QDA, on the other hand, estimates class-specific covariance matrices, which can be more complex and challenging to interpret. However, QDA’s flexibility allows it to capture complex relationships and provide a better fit in certain scenarios.
  4. Dataset Size and Feature Relationships:
    The choice between QDA and Logistic Regression, including polynomial logistic regression, can also depend on the dataset size and the nature of feature relationships.
    QDA tends to perform well with smaller datasets when its assumptions are reasonably met.
    In contrast, Logistic Regression, particularly polynomial logistic regression, can handle larger datasets effectively and capture non-linear relationships. Additionally, Logistic Regression can accommodate categorical variables more naturally through appropriate encoding techniques.
  5. Practical Considerations:
    Practical considerations play a crucial role in model selection.
    Logistic Regression, including polynomial logistic regression, is widely used and well-understood. It enjoys extensive support in various software packages and offers readily available implementations.
    QDA, while less commonly used, has its own niche in specific applications, particularly when the assumptions hold. It is essential to consider the computational resources required for both methods, as QDA may be more computationally demanding due to estimating covariance matrices.

Conclusion

In the end, I would say that you should give QDA a try. I feel that Logistic Regression might win in most cases. But, remember that in the field of machine learning, there is rarely a one-size-fits-all “clear winner” for all scenarios. It’s important to choose the method that best aligns with the specific requirements and characteristics of your classification problem, taking into account factors such as interpretability, performance, and assumptions.

I feel QDA might win in most cases. But, experiment! [SOURCE]

Link to my chatGPT conversation about QDA and Logistic Regression — https://shorturl.at/oMSV8

You can read about Discriminant Analysis if anyone is interested in understanding the theory. Though, I, personally, am kind of more interested in applied ML and Data science.

If you’re here, do give me a follow on Medium and let’s connect on Linkedin to chat more about Data Science!

--

--

Sarthak Arora
Analytics Vidhya

Data Scientist @ Jupiter.co | Ex - Assistant Manager in Analytics @ Paisabazaar | I write about Data Science and ML | https://www.linkedin.com/in/iasarthak/