Analysis of Variance(ANOVA) in Data Science and Analytics! 📊

Sarowar Jahan Saurav
4 min readMay 11, 2024

Welcome, fellow data enthusiasts, to a captivating journey into the realm of Analysis of Variance (ANOVA)! 🚀 In this blog, we’ll dive deep into the world of statistical analysis and explore how ANOVA can unlock hidden insights within your data. So buckle up, put on your statistical thinking caps, and let’s embark on this exciting adventure together! 🎢

What is ANOVA and Why Should You Care? 🤔

ANOVA, short for Analysis of Variance, is a powerful statistical technique that allows you to compare the means of two or more groups. 📈 It helps us answer questions like: Are there any significant differences between the means of these groups? Is there a relationship between our independent variables and the dependent variable? ANOVA holds the key to uncovering these answers and more!

The Three Varieties of ANOVA

✨ One-Way ANOVA: Imagine you have multiple groups, such as different pricing strategies for a product. One-Way ANOVA comes to the rescue by determining whether these groups have significantly different means. It’s like having a taste test to find out which flavor reigns supreme! 🍦

✨ Two-Way ANOVA: Now let’s take it up a notch! Two-Way ANOVA allows us to examine the interaction between two independent variables and their impact on the dependent variable. It’s like mixing different ingredients together and discovering how they influence the final dish’s flavor! 🍳🌶️

✨ MANOVA (Multivariate ANOVA): Imagine you have multiple dependent variables. MANOVA steps in to examine whether there are significant differences among the groups across all these variables simultaneously. It’s like analyzing multiple dimensions of a problem and gaining a holistic understanding! 🌍🔍

Statistical Magic 🔮

Let’s get hands-on! In this section, we’ll explore the step-by-step process of conducting ANOVA. From formulating hypotheses to performing post-hoc tests, we’ll leave no stone unturned. We’ll journey through the mystical world of p-values, F-statistics, and effect sizes, demystifying their significance along the way. 🧙‍♀️📚

Code for One-Way ANOVA

import pandas as pd
import numpy as np
from scipy.stats import f_oneway

# Sample data
data = {
'Group1': [15, 20, 25, 30, 35],
'Group2': [10, 15, 20, 25, 30],
'Group3': [25, 30, 35, 40, 45],
'Group4': [12, 15, 10, 11, 13],
'Group5': [9, 8, 7, 6, 10],
'Group6': [18, 19, 21, 17, 16],
'Group7': [5, 7, 9, 6, 8],
'Group8': [14, 11, 10, 12, 13],
'Group9': [20, 22, 19, 18, 21],
'Group10': [7, 8, 9, 6, 10],
'Group11': [13, 15, 16, 12, 14],
'Group12': [9, 12, 10, 11, 8],
'Group13': [16, 17, 15, 14, 19]
}

# Creating DataFrame
df = pd.DataFrame(data)

# ANOVA calculation
f_statistic, p_value = f_oneway(df['Group1'], df['Group2'], df['Group3'], df['Group4'], df['Group5'],
df['Group6'], df['Group7'], df['Group8'], df['Group9'], df['Group10'],
df['Group11'], df['Group12'], df['Group13'])

# Degrees of freedom
df_between = len(df.columns) - 1
df_within = len(df.values.flatten()) - len(df.columns)

# Sum of squares between groups
ss_between = df_between * sum([(df[col].mean() - df.values.flatten().mean())**2 for col in df.columns])

# Sum of squares within groups
ss_within = sum([sum((df[col] - df[col].mean())**2) for col in df.columns])

# Mean squares
ms_between = ss_between / df_between
ms_within = ss_within / df_within

# F-statistic
f_statistic_manual = ms_between / ms_within

# Printing ANOVA table
print("F-statistic:", f_statistic)
print("p-value:", p_value)
print("-------------------------------------")
print("Analysis of Variance (ANOVA) Table:")
print("-------------------------------------")
print("Source of Variation | Sum of Squares | Degrees of Freedom | Mean Squares | F-statistic | p-value")
print("---------------------------------------------------------------------------------------------")
print(f"Between Groups | {ss_between:.2f} | {df_between} | {ms_between:.2f} | {f_statistic:.2f} | {p_value:.4f}")
print(f"Within Groups | {ss_within:.2f} | {df_within} | {ms_within:.2f} |")
print("---------------------------------------------------------------------------------------------")
F-statistic: 18.996606509332086
p-value: 6.226010140718324e-15
-------------------------------------
Analysis of Variance (ANOVA) Table:
-------------------------------------
Source of Variation | Sum of Squares | Degrees of Freedom | Mean Squares | F-statistic | p-value
---------------------------------------------------------------------------------------------
Between Groups | 9094.52 | 12 | 757.88 | 19.00 | 0.0000
Within Groups | 864.40 | 52 | 16.62 |
---------------------------------------------------------------------------------------------

Code for Two-Way ANOVA

import pandas as pd
import numpy as np
from scipy.stats import f_oneway
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

# Sample data table
data = {
'Factor_A': ['A1', 'A1', 'A1', 'A1', 'A1', 'A2', 'A2', 'A2', 'A2', 'A2', 'A3', 'A3', 'A3', 'A3', 'A3',
'A1', 'A1', 'A1', 'A1', 'A1', 'A2', 'A2', 'A2', 'A2', 'A2', 'A3', 'A3', 'A3', 'A3', 'A3',
'A1', 'A1', 'A1', 'A1', 'A1', 'A2', 'A2', 'A2', 'A2', 'A2', 'A3', 'A3', 'A3', 'A3', 'A3'],
'Factor_B': ['B1', 'B1', 'B1', 'B1', 'B1', 'B1', 'B1', 'B1', 'B1', 'B1', 'B1', 'B1', 'B1', 'B1', 'B1',
'B2', 'B2', 'B2', 'B2', 'B2', 'B2', 'B2', 'B2', 'B2', 'B2', 'B2', 'B2', 'B2', 'B2', 'B2',
'B3', 'B3', 'B3', 'B3', 'B3', 'B3', 'B3', 'B3', 'B3', 'B3', 'B3', 'B3', 'B3', 'B3', 'B3'],
'Response': [40, 34, 36, 24, 49, 16, 25, 13, 13, 10, 31, 27, 14, 25, 34,
31, 46, 38, 36, 23, 17, 31, 29, 38, 16, 15, 20, 17, 33, 45,
47, 36, 30, 30, 44, 35, 24, 23, 40, 44, 36, 40, 49, 47, 15]
}

# Creating DataFrame
df = pd.DataFrame(data)

# Two-way ANOVA
formula = 'Response ~ C(Factor_A) + C(Factor_B) + C(Factor_A):C(Factor_B)'
model = ols(formula, df).fit()
anova_table = anova_lm(model)

# Printing the sample data table
print("Sample Data Table:")
print(df)
print()

# Printing ANOVA table
print("Two-way ANOVA Table:")
print(anova_table)

Real-World Applications and Case Studies 🌟

ANOVA isn’t just theoretical wizardry — it has practical applications in various domains. We’ll explore how ANOVA empowers data scientists and analysts to unlock insights in fields like psychology, marketing, healthcare, and beyond. Get ready to see ANOVA in action through captivating case studies and real-world examples!

Remember, ANOVA is not just a statistical tool; it’s a gateway to uncovering valuable insights and driving informed decision-making. So go forth, analyze, discover, and let the magic of ANOVA guide you to data-driven success! ✨🔍📊

--

--

Sarowar Jahan Saurav

I'm truly passionate to be a changemaker and always eager to help my community using Technology