NCAA Football Rankings Using PageRank Adaptations

Patrick Wilson
4 min readNov 9, 2014

I believe the current system of determining College Football Rankings is broken, and Google’s PageRank may be the key to fixing the system.

The current system involves a lot of strange metrics and some amount of arbitrary decisions by a committee who looks at game film. Condoleezza Rice is among those who lead this heralded group.

I wanted to create a ranking system that leaves all of this behind and simply looks at the bare facts. The only data I wanted to consider was wins and losses. The current system takes margin of victory, home-field advantage, and many other metrics into account. There is just too many intricacies of life to be able to accurately take all these things into account and have it be effective. Teams react differently in a blowout: second stringers may go in, game strategy may change, etc.

To combat all this, I was curious to see what would happen if you calculated the rankings solely based on the wins and losses of a team. A team is a single unit, and I believe that no individual player, or coaches decisions should impact that bottom line.

My Original Idea: Total wins of teams who you beat (TWINS)

This ranking is simply a one hop calculation where your teams score is determined by the sum of team wins that every team who you beat has. For example:

The Michigan Wolverines (as of Nov 9th) have beaten: Appalachian State (4–5), Miami of Ohio (2–8), Penn State (5–4), Indiana (3–6), and Northwestern (3–6). This yields a TWINS score of 4+2+5+3+3 = 17. Good enough for a Number 56 overall rating.

The top 15 after running this calculation are as follows:

These ratings are interesting and honestly pretty similar to the current ratings with major exceptions including UCLA ranked unusually high. I checked out their schedule and this is caused by wins over two great teams in ASU and Arizona.

For my Michigan State friends out there, the Spartans are ranked 32nd with this implementation. However, a victory against OSU last night would have put them at 12th overall.

Interested to see what you all think of this. Variations could include looking at multiple hops in addition to just the one. A drawbacks of this algorithm includes number of games played as a determining factor which affects mid season rankings but could be easily normalized if need be, by games played.

This line of thought lead me to an additional method derived from PageRank that I believe really highlights from interesting information.

My Second Idea: PageRank with losses as edges in directed graph (PARK)

PageRank is an algorithm named after Larry Page of Google and is used to determine the importance of webpages ranked in relation to all the other sites in a set. It works by assigning a score to each webpage that is determined by the number of incoming links and the scores of those webpages which the incoming links come from.

Having taken a few classes that touch on the subject of information retrieval, classifying, and machine learning, I was HYPE to test out a variation of this algorithm tailored to the College Football Ranking problem.

My solution basically treats teams as nodes and each individual game played as an edge on the directed graph. Margin of victory is not taken into account intentionally due to some of the reasons mentioned above. The parameters include a damping factor that, in this context, represents the likelihood that a team who is supposed to beat a team, will lose. So tampering with this param will slightly alter the results.

In the context of football, every team has a score. And every team that a given team loses to gets an equal portion of that score. So say you beat 7 teams; the better those teams are, and the fewer times they have losses, the higher your score will be!

All in all, this algorithm essentially takes into consideration EVERY game played over the course of this entire season. Because of schools scheduling out of conference games, there are no isolated pools and every team’s score is effected by every other team in some way or another. The only manipulation was that I added a self loop edge for the undefeated teams so that those teams would receive the benefit of their own lack of losses.

The top 15 after running this calculation are as follows:

This was really what I was looking for. What were the teams who are extremely underrated/overrated with todays system?

UCLA… With wins over ASU and Arizona, my implementation states that this two loss team deserves a spot in the 4 team playoff. A truly interesting result. Interestingly, no love for Notre Dame especially after that loss last night.

The Big Ten also gets NO LOVE from this system and probably rightly so considering out out of conference play.

Michigan State is ranked: 20th (ahead of Ohio State)

Ohio State is ranked: 22nd

Michigan is ranked: 66th #goblue

I just cranked algo/post out this out this morning so it is by no means polished, but there is some really cool potential here in how we view existing ranking systems. Should all these intangibles be taken into consideration? Or should we just let wins speak for themselves?

--

--