An Exploration of UFC Data — Part 3-b: Preliminary Data Analysis

Descriptive statistics on the whole dataset

To understand what questions we might be able to formulate, it is helpful to take a quick glance at the dataset as a whole. Perhaps we can answer some simple questions based upon the descriptive statistics.

Many of the interesting questions revolve around how winners and losers might differ in terms of their strikes landed, takedowns, submission, and passes. Future questions, such as whether winners can be predicted can be answered after a general sense has been explored.

Using the “describe” function in the Hmisc package simplifies the process of conducting descriptive statistics on all the variables of interest in my dataset. I purposefully left out the fighters names.

Output

winners
11  Variables      46249  Observations
--------------------------------------------------------------------
Fighter_1_Strikes
--------------------------------------------------------------------
n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50 
46249 0 169 1 37.52 31.96 3 6 15 31
.75 .90 .95
53 76 93
lowest :   0   1   2   3   4, highest: 206 209 220 225 238
--------------------------------------------------------------------
Fighter_2_Strikes
--------------------------------------------------------------------
       n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50 
46249 0 126 0.999 22.64 22.75 0 1 6 17
.75 .90 .95
33 52 65
lowest :   0   1   2   3   4, highest: 143 150 160 166 176
--------------------------------------------------------------------
Fighter_1_TDs
--------------------------------------------------------------------
       n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50 
46249 0 17 0.912 1.55 1.986 0 0 0 1
.75 .90 .95
2 4 6

Value 0 1 2 3 4 5 6 7 8 9 10 11 12
Frequency 19420 10143 5909 3894 2476 1869 1067 612 314 240 137 84 56
Proportion 0.420 0.219 0.128 0.084 0.054 0.040 0.023 0.013 0.007 0.005 0.003 0.002 0.001

Value 13 14 16 21
Frequency 1 17 2 8
Proportion 0.000 0.000 0.000 0.000
--------------------------------------------------------------------
Fighter_2_TDs
--------------------------------------------------------------------

n missing distinct Info Mean Gmd
46249 0 11 0.723 0.6506 0.9873

Value 0 1 2 3 4 5 6 7 8 9 11
Frequency 29836 9287 3684 1701 973 437 193 77 19 28 14
Proportion 0.645 0.201 0.080 0.037 0.021 0.009 0.004 0.002 0.000 0.001 0.000
--------------------------------------------------------------------
Fighter_1_SUB
--------------------------------------------------------------------
       n  missing distinct     Info     Mean      Gmd 
46249 0 11 0.735 0.6049 0.891

Value 0 1 2 3 4 5 6 7 8 9 10
Frequency 29235 10367 4154 1543 526 183 129 77 4 1 30
Proportion 0.632 0.224 0.090 0.033 0.011 0.004 0.003 0.002 0.000 0.000 0.001
--------------------------------------------------------------------
Fighter_2_SUB
--------------------------------------------------------------------
       n  missing distinct     Info     Mean      Gmd 
46249 0 9 0.456 0.3003 0.5207

Value 0 1 2 3 4 5 6 7 9
Frequency 37707 5278 2028 741 270 164 24 23 14
Proportion 0.815 0.114 0.044 0.016 0.006 0.004 0.001 0.000 0.000
--------------------------------------------------------------------
Fighter_1_PASS
--------------------------------------------------------------------
       n  missing distinct     Info     Mean      Gmd 
46249 0 21 0.902 1.782 2.409
lowest :  0  1  2  3  4, highest: 16 17 19 20 26
--------------------------------------------------------------------
Fighter_2_PASS
--------------------------------------------------------------------
       n  missing distinct     Info     Mean      Gmd  
46249 0 12 0.566 0.4651 0.7788

Value 0 1 2 3 4 5 6 7 8 9 10 11
Frequency 34939 6247 2595 1194 629 266 185 91 60 15 26 2
Proportion 0.755 0.135 0.056 0.026 0.014 0.006 0.004 0.002 0.001 0.000 0.001 0.000
--------------------------------------------------------------------
Round
--------------------------------------------------------------------
       n  missing distinct 
46248 1 5

Value 1 2 3 4 5
Frequency 14867 7834 21650 304 1593
Proportion 0.321 0.169 0.468 0.007 0.034
--------------------------------------------------------------------
Time
--------------------------------------------------------------------
       n  missing distinct 
46249 0 348
lowest : 0:22  0:24  0:27  0:50  1:00 , highest: 0:02  11:34 15:00 7:00  9:00 
--------------------------------------------------------------------
Fight Ending Methodology
--------------------------------------------------------------------
       n  missing distinct 
45970 279 6

Value Decision DQ KO/TKO S-DEC SUB U-DEC
Frequency 20 95 16655 4001 9710 15489
Proportion 0.000 0.002 0.362 0.087 0.211 0.337
--------------------------------------------------------------------

Interpretation

Coming soon