An Exploration of UFC Data — Part 4: Linear Regression

Richard O'Brien
3 min readAug 6, 2018

Do “winners” land more strikes than “losers”?

Null Hypothesis: There is no difference in the number of strikes landed between winners and losers.

Alternate Hypothesis: There is a significant difference in the number of strikes landed between winners and losers. Given what we know about the UFC scoring system, the number of strikes landed counts towards the total score for each round. Regardless of how the fight ended, we would expect that “winners” would land more strikes than “losers” due to the fact that strikes landed largely influences the scoring of the bout and could increase the probability of winning (although this is untested as of now).

Building a Function to Plot a Linear Regression Model

Function Output

Printing a summary of the linear regression model

Call:
lm(formula = Fighter_1_Strikes ~ Fighter_2_Strikes, data = winners)
Residuals:
Min 1Q Median 3Q Max
-61.345 -12.496 -3.562 9.357 91.698
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.495658 0.134395 115.3 <2e-16 ***
Fighter_2_Strikes 0.943139 0.004716 200.0 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 19.15 on 45256 degrees of freedom
Multiple R-squared: 0.4691, Adjusted R-squared: 0.4691
F-statistic: 3.999e+04 on 1 and 45256 DF, p-value: < 2.2e-16

Based upon the coefficient for Fighter_2_Strikes (“losers”), it appears that a 1 strike increase for “losers” results in a predicted ~0.94 strike increase for “winners”.

Plotting the Assumptions

Assumption Plots Interpretation

Looking at the plots above we can see that the data is not normally distributed, there is not a mean of zero, and there is uncommon variance. In the code the data has already been normalized. Given that the assumptions appear to be violated it might be useful to perform a transformation on the data.

Answering whether winners land more strikes than losers

To test whether winners land more strikes than losers we can use the information captured from the linear regression model. We can see that the assumptions have been violated, so that indicates that a non-parametric test will have to be used to answer this question.

Kruskal-Wallis rank sum testdata:  Fighter_1_Strikes by Fighter_2_Strikes
Kruskal-Wallis chi-squared = 24809, df = 88, p-value < 2.2e-16
"Winners mean strikes landed"
35.45101
"Winners SD strikes landed"
26.28305
"Losers mean strikes landed"
21.15845
"Losers SD strikes landed"
19.08679

As shown in the Kruskal-Wallis test, there is a significant difference in the means of strikes landed by winners and losers. Winners land ~14 strikes more on average compared to losers

--

--