## Rank Your Own Damn Teams

football poll bcs

Sat Oct 31 13:46:11 -0700 2009

I think almost all college football fans would agree that the BCS is, if nothing else, a strange system. There are human polls, computer rankings, & lots and lots of money. After the craziness that put Iowa at the top of the computers (which I agree with, I have to say), I started reading about some of these mysterious computers. Frankly, some of them are odd, as they start with preseason "power rankings," which are basically guessing (the Billingsley model actually punishes teams for not performing based on how he guessed they would perform). I was wondering if it was possible to produce a computer program that didn't involve any initial seeding and could still produce reasonable rankings.

The basic formula came to me in a dream and I coded it up this morning. I use two variables to calculate my rankings:

1. Scoring Defense - the average number of points allowed in a game
2. Scoring Offense - the average number of points scored in a game

Teams are then sorted according to these two statistics. A total quality score is assigned based on the average of these two numbers. As an example, Texas ranks #1 in scoring offense and #9 in scoring defense. So, to calculate their total quality score, we'd do the following:

((120-0) + (120-8)) / 2

120 is the total number of teams and it's 0 and 8 b/c of the way arrays are stored on computers. Anyway, this gives them an average score of 116, which is the highest in the country. Thus, their quality score is a 0 (which is good, the lower the quality score, the better).

So, having calculated a team's quality, we have three components that determine a ranking:

1. Overall Quality - the total number of teams minus the quality score calculated above.
2. Strength of Schedule - the total number of teams minus the average opponent quality.
3. Win/Loss Ranking - the weirdest of the three statistics. For this statistic, we calculate a quality differential for each game a school plays: this is the absolute difference b/t the quality scores of each team. We then adjust this score up slightly for a road win and if it is an upset win (which is defined as beating a team w/ a 20 point higher quality score) and down slightly for an upset loss (which is defined as losing to a team w/ a 20 point higher quality score). After adjustment, the difference between the total number of teams and this adjusted quality differential is subtracted from the ranking in the event of a loss and added in the event of a win.

After having calculated these three values, we weight them. I multiple the overall quality by 1.25, the strength of schedule by 1.5, and the win/loss ranking by 0.5. Reducing the value of the ranking resulted from the fact that very weird things happen when you treat them equally, esp. w/r/t teams that have good win/loss ratios against evenly matched teams but who aren't very high quality themselves. Idaho is the perfect example of this: they have a quality score of 65 and play teams, primarily, with scores b/t 70-100. These wins look good to our ranking calculations but not when you factor in strength schedule. Therefore, we have to rank the strength of schedule high and the win/loss ranking low (but, again, the numbers give weird results w/o the win/loss ranking in the calculations).

I've also settled on these weighting values because after exploring different combinations of weights, I found that these values produce the best Top 25 both at the bottom and the top. Under different combinations, teams like Troy and Idaho would appear higher ranked than teams like Miami and Navy, who have played slightly higher quality teams overall.

Here are the Week 6 Top 25 rankings from my program:

1. Iowa 8-0 (580.919421487603)
2. Alabama 8-0 (537.909090909091)
3. Florida 7-0 (445.373376623377)
4. Cincinnati 7-0 (440.402597402597)
5. Texas 7-0 (429.279220779221)
6. Texas Christian 7-0 (416.37012987013)
7. Boise State 7-0 (415.217532467532)
8. Pittsburgh 7-1 (386.907102272727)
9. Georgia Tech 7-1 (376.955965909091)
10. Southern California 6-1 (361.207235621521)
11. Louisiana State 6-1 (338.557513914657)
12. Oregon 6-1 (308.897031539889)
13. Houston 6-1 (302.898903693709)
14. Utah 6-1 (280.207792207792)
15. Penn State 7-1 (278.800852272727)
16. West Virginia 6-1 (271.038033395176)
17. Virginia Tech 5-2 (269.174860853432)
18. Central Michigan 7-1 (256.328267045455)
19. Notre Dame 5-2 (256.321892393321)
20. South Carolina 6-2 (249.480681818182)
21. Oklahoma State 6-1 (242.455658627087)
22. Brigham Young 6-2 (230.034375)
23. Ohio State 6-2 (224.322443181818)
24. Miami (Florida) 5-2 (222.708719851577)
25. Navy 6-2 (217.938920454545)

Also, here's the Ruby code if you want to play around with the weights:

```UPSET_COUNT = 20
UPSET_FACTOR = 1.1

QUALITY_FACTOR = 1.25
RANKING_FACTOR = 0.5
STRENGTH_OF_SCHEDULE_FACTOR = 1.5

def fetch(uri_str, limit = 10)
require 'uri'

require 'net/http'

raise ArgumentError, 'HTTP redirect too deep' if limit == 0

response = Net::HTTP.get_response(URI.parse(uri_str))

case response
when Net::HTTPSuccess     then response
when Net::HTTPRedirection then fetch(response['location'], limit - 1)

else
response.error!
end
end

def parse_record(team_table_html)

record = {}
record['points_scored'] = 0

record['points_allowed'] = 0
record['schedule'] = []

record['win'] = 0
record['loss'] = 0

html_lines = team_table_html.split(/\n/)

# Remove the lines we don't need (last line, opening of table line):

junk = html_lines.shift
junk = html_lines.shift

junk = html_lines.pop
junk = ""

# Parse out the name:

record['name'] = html_lines.shift.gsub(/<(.|\n)*?>/,"").sub(/\s\([A-Z]+\)/, "").chomp

record['name'] = record['name'].sub(" (Big 12)","").sub(" (Big Ten)","").sub(" (Pac 10)","").sub(" (Big East)","").sub(" (Sun Belt)","").sub(" (Independent)","")

html_lines.each do |line|
line = line.gsub(" align=\"right\"", "").gsub("", "\t").gsub!(/<(.|\n)*?>/,"")

line_array = line.split(/\t/)

if line_array[4] != "W" and line_array[4] != "L"

next
end

record['points_scored'] += line_array[5].to_f

record['points_allowed'] += line_array[6].to_f

game_hash = {}

game_hash['name'] = line_array[3].sub(/^\*/,"").chomp

game_hash['away?'] = (line_array[2].chomp == "@")

game_hash['win?'] = (line_array[4].chomp == "W")

if game_hash['win?']
record['win'] += 1

else
record['loss'] += 1
end

record['schedule'].push(game_hash)

end

record['scoring_offense'] = record['points_scored'] / record['schedule'].length

record['scoring_defense'] = record['points_allowed'] / record['schedule'].length

record
end

def parse_html(rankings_html)
team_hash = {}

teams_html = rankings_html.split(/^ /)

junk = teams_html.shift

teams_html.each do |team_table_html|
team_hash_entry = parse_record(team_table_html)

team_hash[team_hash_entry['name']] = team_hash_entry
end

team_hash

end

def calculate_quality(team_hash)
quality_hash = {}

total_teams = team_hash.length

offense_quality_hash = {}
defense_quality_hash = {}

team_hash.each_pair do |name, team|

offense_quality_hash[name] = team['scoring_offense']
defense_quality_hash[name] = team['scoring_defense']

end

# Sort by points scored, from highest to lowest:
offense_quality_array = offense_quality_hash.sort {|a,b| b[1]<=>a[1]}

# Sort by points allowed, from lowest to highest:
defense_quality_array = defense_quality_hash.sort {|a,b| a[1]<=>b[1]}

quality_hash = {}

puts "Offense Quality Rankings:"

offense_quality_array.each_index do |i|

team_name = offense_quality_array[i][0]
team_hash[team_name]["offense_quality"] = i

puts "#{i+1}. #{offense_quality_array[i][0]} #{offense_quality_array[i][1]}"
end

puts "\n\n"

puts "Defense Quality Rankings:"

defense_quality_array.each_index do |i|
team_name = defense_quality_array[i][0]

team_hash[team_name]["defense_quality"] = i
puts "#{i+1}. #{defense_quality_array[i][0]} #{defense_quality_array[i][1]}"

# Different methods of calculating overall quality:

# Sadly, this gives the best results, but it isn't allowed, as it uses average margin of victory:
#quality_hash[team_name] = team_hash[team_name]['scoring_offense'] - team_hash[team_name]['scoring_defense']

#quality_hash[team_name] = (team_hash[team_name]["defense_quality"] + team_hash[team_name]["offense_quality"]) / 2

quality_hash[team_name] = ((total_teams - team_hash[team_name]["offense_quality"]) + (total_teams - team_hash[team_name]["defense_quality"])) / 2

end

puts "\n\n"

puts "Overall Quality Rankings:"

quality_array = quality_hash.sort {|a,b| b[1]<=>a[1]}

#quality_array = quality_hash.sort {|a,b| a[1]<=>b[1]}
previous_value = -1
quality_array.each_index do |i|

team_name = quality_array[i][0]
team_quality = quality_array[i][1].to_i

if team_quality == previous_value
i -= 1
end

team_hash[team_name]["quality"] = i

previous_value = team_quality

puts "#{i+1}. #{quality_array[i][0]} #{quality_array[i][1]}"
end

puts "\n\n"

team_hash
end

def calculate_rankings(team_hash)
rankings_hash = {}

total_teams = team_hash.length

team_hash.each_pair do |name, team|

# Formula:
#  For a win: value is increased by inverse quality differnetial
#  For a loss: value is decreased by quality differential
strength_of_schedule = 0
ranking = 0

puts "#{name} #{team['win']}-#{team['loss']} (#{team['quality']})"

team["schedule"].each do |game|

# If a team plays an FCS squad, we rank them as the last, in terms of quality
if team_hash.has_key? game['name']
opponent_quality = team_hash[game["name"]]["quality"]

fcs = false
else
opponent_quality = total_teams

fcs = true
end

strength_of_schedule += opponent_quality

quality_differential = (team["quality"] - opponent_quality).abs

if fcs
# Punish teams mercilessly for playing an FCS squad:
quality_differential = total_teams
else

if opponent_quality < team["quality"]
# Upset:
if game["win?"] && quality_differential > UPSET_COUNT

quality_differential /= UPSET_FACTOR
end
else
# Upset:
if !game["win?"] && quality_differential > UPSET_COUNT

quality_differential *= UPSET_FACTOR
end
end
end

# Give a boost for a road win:

if game["away?"] && game["win?"]

end

quality_differential = (total_teams - quality_differential).abs

if game["win?"]
ranking += quality_differential
else

ranking -= quality_differential
end

puts "\t#{game["name"]}(#{opponent_quality}) #{quality_differential} (#{ranking})"

end
strength_of_schedule = total_teams - (strength_of_schedule.to_f / (team["win"] + team["loss"]))

# Add strength of schedule, ranking, & quality of team.  Multiple by win percentage:
rankings_hash[name] = ((strength_of_schedule * STRENGTH_OF_SCHEDULE_FACTOR) + (ranking * RANKING_FACTOR) + ((total_teams - team['quality']) * QUALITY_FACTOR)) * (team['win'].to_f / (team['win'] + team['loss']))

end

rankings_hash.sort {|a,b| b[1]<=>a[1]}

end

if File.exists? "Sked2009.htm"

else
rankings_html = fetch('http://www.jhowell.net/cf/scores/Sked2009.htm').body
end

@team_hash = parse_html(rankings_html)

@team_hash = calculate_quality(@team_hash)
@rankings_array = calculate_rankings(@team_hash)

puts "\n\n"

@rankings_array.each_index do |i|
puts "#{i+1}. #{@rankings_array[i][0]} #{@team_hash[@rankings_array[i][0]]['win']}-#{@team_hash[@rankings_array[i][0]]['loss']} (#{@rankings_array[i][1]})"

end```