Eh-F Tests: Run ANOVA Easier with *args

Aneesh Kodali
Analytics Vidhya
Published in
4 min readOct 15, 2019

If you’ve had to run an ANOVA test to compare means of multiple groups, you might have wondered if there’s a way to do so without manually entering each group’s data. I’ll show you a way to do so through a simple example.

Let’s create a dataframe from thin air:

import pandas as pd
import numpy as np
data = pd.DataFrame(data=[['A', 'B', 'C', 'D']*3]).T
data.columns=['Group']
data['Value'] = np.random.randint(1,10,size=len(data))

As you can see, we have 4 groups. Don’t worry if you have different numbers in the Value column. What’s important later on is that you get the same test results form the two methods that I compare. We’d like to test if the means of ‘Value’ for each Group are the same or if there exists at least one group with a statistically significantly different mean. Naturally, we’d conduct an F-test to do and would probably write Python code as such:

import scipy.stats as stats
stats.f_oneway(
data.loc[data['Group']=="A", 'Value']
, data.loc[data['Group']=="B", 'Value']
, data.loc[data['Group']=="C", 'Value']
, data.loc[data['Group']=="D", 'Value']
)
F_onewayResult(statistic=0.27946127946127947, pvalue=0.8388488338536153)

Now, this code may not seem that difficult to write. But imagine if you had more groups in your data (I once did an F-test for all 30 teams in the NBA). You’d have to write a line for each group (like I did above with 4 groups). What if there was a way to write out a test that could more easily accommodate more groups?

ANSWER: You can use *args to make your life easier. You may have seen *args or **kwargs in function descriptions. Both serve the same purpose: to pass in a variable number of arguments or keyword arguments. Let’s quickly see them use with simple examples. I won’t use **kwargs for our F-test application but, since they both serve similar functions, I figured why not include an example for it.

Let’s create a function in which we make a food dish with *args:

def make_a_dish(dish_name, ingredient_1, *ingredients):  
print(f"I will make {dish_name}")
print(f"Start with {ingredient_1}")
for ingredient in ingredients:
print(f"Add {ingredient}")
make_a_dish("ice cream", "ice", "cream", "sprinkles", "whipped cream", "cherry")I will make ice cream
Start with ice
Add cream
Add sprinkles
Add whipped cream
Add cherry

The function takes dish_name and ingredient_1 as required arguments, and any additional optional ingredients you’d like to use (*ingredients, you could even enter 0 additional ingredients). These additional ingredients are interpreted as a tuple by which we can iterate through each element. You could then use the function again to make a different dish and specify a different number of ingredients. The beauty of using *args is that you don’t have to recreate the function to handle a different number of arguments, so you can leave the original function alone.

Let’s quickly see **kwargs in use for a function I could use to make a dating profile:

def introduce_yourself(name, **facts):  
print(f"My name is {name}")
for key, value in facts.items():
print(f"My {key} is {value}")
introduce_yourself(name="Aneesh", sign="Aquarius", age=27, height="6 ft")My name is Aneesh
My sign is Aquarius
My age is 27
My height is 6 ft

Like I said, **kwargs behaves similarly to *args except you specify variables and their corresponding values. I enter a name argument followed by however many variable-value pairs I want. Those pairs will be interpreted as key-value pairs of a dictionary, hence why I iterate through them using .items(). I can specify as many or as few (once again, even 0) variable-value pairs.

Now, back to our F-test. We can use the concept of *args to create our groups of values on demand in one line. See below:

import scipy.stats as stats
stats.f_oneway(
*(data.loc[data[‘Group’]==group, ‘Value’]
for group in data[‘Group’].unique())
)
F_onewayResult(statistic=0.27946127946127947, pvalue=0.8388488338536153)

Think of it this way: stats.f_oneway(*args) is equivalent to stats.f_oneway(arg1, arg2, …, argN). In this case, each arg is a subset of our data corresponding to each group.

Stats.f_oneway() is just like any other function; it takes a certain number of arguments, where each argument is a subset of data corresponding to each group. The code inside the F-test function returns a tuple (like the arg example from above), where each element in the tuple is a list of Value values for each group. So, no matter if you have 4 or 30 or 100 groups, the code can handle any number of groups and pass in each group as an ‘argument’ to the F-test function.

--

--