Q#110: College football expenses by conference
Note: I believe it should be free and readily available for everyone to gain value from data, hence I pledge to keep this series free regardless of how large it grows.
Given the following dataset on college sports operating expenses and this dataset that classifies a subset of schools into their respective football conference, show the school in each conference that has the highest football program expense, along with the average football program cost for all schools in the conference. Your output should contain the following fields:
- Conference
- Average conference spend on football
- College with highest spend
- Football spend of college with highest spend
TRY IT YOURSELF
ANSWER
Step-by-Step Analysis
1. Calculate the Average Football Program Spend by Conference
To find the average football program cost for each conference, we will group the data by Conference
and calculate the mean of the Football Total Expenses
.
# Calculate average football expenses per conference
avg_conference_spend = df_combined.groupby('Conference')['Football Total Expenses'].mean().reset_index()
avg_conference_spend.columns = ['Conference', 'Average Football Spend']
2. Find the College with the Highest Football Spend in Each Conference
Next, we will identify the school in each conference that spends the most on their football program.
# Find the college with the highest football spend in each conference
max_spend_school = df_combined.loc[df_combined.groupby('Conference')['Football Total Expenses'].idxmax()][['Conference', 'School', 'Football Total Expenses']]
max_spend_school.columns = ['Conference', 'College with Highest Spend', 'Football Spend of College with Highest Spend']
3. Combine the Results
We’ll merge the two results to create a final dataset that contains the conference name, the average football spend, the school with the highest spend, and the spend amount.
# Merge the average spend and the highest spend data
final_df = pd.merge(avg_conference_spend, max_spend_school, on='Conference')
# Sort by conference for readability
final_df = final_df.sort_values(by='Conference').reset_index(drop=True)
Plug: Checkout all my digital products on Gumroad here. Please purchase ONLY if you have the means to do so. Use code: MEDSUB to get a 10% discount!
Earn $25 and 4.60% APY for FREE through my referral at SoFi Bank Here