Rank Your Own Damn Teams

football poll bcs

Sat Oct 31 13:46:11 -0700 2009

I think almost all college football fans would agree that the BCS is, if nothing else, a strange system. There are human polls, computer rankings, & lots and lots of money. After the craziness that put Iowa at the top of the computers (which I agree with, I have to say), I started reading about some of these mysterious computers. Frankly, some of them are odd, as they start with preseason "power rankings," which are basically guessing (the Billingsley model actually punishes teams for not performing based on how he guessed they would perform). I was wondering if it was possible to produce a computer program that didn't involve any initial seeding and could still produce reasonable rankings.

The basic formula came to me in a dream and I coded it up this morning. I use two variables to calculate my rankings:

  1. Scoring Defense - the average number of points allowed in a game
  2. Scoring Offense - the average number of points scored in a game

Teams are then sorted according to these two statistics. A total quality score is assigned based on the average of these two numbers. As an example, Texas ranks #1 in scoring offense and #9 in scoring defense. So, to calculate their total quality score, we'd do the following:

((120-0) + (120-8)) / 2

120 is the total number of teams and it's 0 and 8 b/c of the way arrays are stored on computers. Anyway, this gives them an average score of 116, which is the highest in the country. Thus, their quality score is a 0 (which is good, the lower the quality score, the better).

So, having calculated a team's quality, we have three components that determine a ranking:

  1. Overall Quality - the total number of teams minus the quality score calculated above.
  2. Strength of Schedule - the total number of teams minus the average opponent quality.
  3. Win/Loss Ranking - the weirdest of the three statistics. For this statistic, we calculate a quality differential for each game a school plays: this is the absolute difference b/t the quality scores of each team. We then adjust this score up slightly for a road win and if it is an upset win (which is defined as beating a team w/ a 20 point higher quality score) and down slightly for an upset loss (which is defined as losing to a team w/ a 20 point higher quality score). After adjustment, the difference between the total number of teams and this adjusted quality differential is subtracted from the ranking in the event of a loss and added in the event of a win.

After having calculated these three values, we weight them. I multiple the overall quality by 1.25, the strength of schedule by 1.5, and the win/loss ranking by 0.5. Reducing the value of the ranking resulted from the fact that very weird things happen when you treat them equally, esp. w/r/t teams that have good win/loss ratios against evenly matched teams but who aren't very high quality themselves. Idaho is the perfect example of this: they have a quality score of 65 and play teams, primarily, with scores b/t 70-100. These wins look good to our ranking calculations but not when you factor in strength schedule. Therefore, we have to rank the strength of schedule high and the win/loss ranking low (but, again, the numbers give weird results w/o the win/loss ranking in the calculations).

I've also settled on these weighting values because after exploring different combinations of weights, I found that these values produce the best Top 25 both at the bottom and the top. Under different combinations, teams like Troy and Idaho would appear higher ranked than teams like Miami and Navy, who have played slightly higher quality teams overall.

Here are the Week 6 Top 25 rankings from my program:

  1. Iowa 8-0 (580.919421487603)
  2. Alabama 8-0 (537.909090909091)
  3. Florida 7-0 (445.373376623377)
  4. Cincinnati 7-0 (440.402597402597)
  5. Texas 7-0 (429.279220779221)
  6. Texas Christian 7-0 (416.37012987013)
  7. Boise State 7-0 (415.217532467532)
  8. Pittsburgh 7-1 (386.907102272727)
  9. Georgia Tech 7-1 (376.955965909091)
  10. Southern California 6-1 (361.207235621521)
  11. Louisiana State 6-1 (338.557513914657)
  12. Oregon 6-1 (308.897031539889)
  13. Houston 6-1 (302.898903693709)
  14. Utah 6-1 (280.207792207792)
  15. Penn State 7-1 (278.800852272727)
  16. West Virginia 6-1 (271.038033395176)
  17. Virginia Tech 5-2 (269.174860853432)
  18. Central Michigan 7-1 (256.328267045455)
  19. Notre Dame 5-2 (256.321892393321)
  20. South Carolina 6-2 (249.480681818182)
  21. Oklahoma State 6-1 (242.455658627087)
  22. Brigham Young 6-2 (230.034375)
  23. Ohio State 6-2 (224.322443181818)
  24. Miami (Florida) 5-2 (222.708719851577)
  25. Navy 6-2 (217.938920454545)

Also, here's the Ruby code if you want to play around with the weights:

UPSET_COUNT = 20
UPSET_FACTOR = 1.1
ROAD_WIN_FACTOR = 1.1

QUALITY_FACTOR = 1.25
RANKING_FACTOR = 0.5
STRENGTH_OF_SCHEDULE_FACTOR = 1.5


def fetch(uri_str, limit = 10)
    require 'uri'

    require 'net/http'

  raise ArgumentError, 'HTTP redirect too deep' if limit == 0

  response = Net::HTTP.get_response(URI.parse(uri_str))

  case response
  when Net::HTTPSuccess     then response
  when Net::HTTPRedirection then fetch(response['location'], limit - 1)

  else
    response.error!
  end
end

def parse_record(team_table_html)

    record = {}
    record['points_scored'] = 0

    record['points_allowed'] = 0
    record['schedule'] = []

    record['win'] = 0
    record['loss'] = 0

    html_lines = team_table_html.split(/\n/)

    # Remove the lines we don't need (last line, opening of table line):

    junk = html_lines.shift
    junk = html_lines.shift

    junk = html_lines.pop
    junk = ""

    # Parse out the name:

    record['name'] = html_lines.shift.gsub(/<(.|\n)*?>/,"").sub(/\s\([A-Z]+\)/, "").chomp

    record['name'] = record['name'].sub(" (Big 12)","").sub(" (Big Ten)","").sub(" (Pac 10)","").sub(" (Big East)","").sub(" (Sun Belt)","").sub(" (Independent)","")

    html_lines.each do |line|
        line = line.gsub(" align=\"right\"", "").gsub("", "\t").gsub!(/<(.|\n)*?>/,"")

    
        line_array = line.split(/\t/)
    
        if line_array[4] != "W" and line_array[4] != "L"

            next
        end
    
        record['points_scored'] += line_array[5].to_f

        record['points_allowed'] += line_array[6].to_f
    
        game_hash = {}

        game_hash['name'] = line_array[3].sub(/^\*/,"").chomp

        game_hash['away?'] = (line_array[2].chomp == "@")

        game_hash['win?'] = (line_array[4].chomp == "W")

        
        if game_hash['win?']
            record['win'] += 1

        else
            record['loss'] += 1
        end
    
        record['schedule'].push(game_hash)

    
    end

    record['scoring_offense'] = record['points_scored'] / record['schedule'].length

    record['scoring_defense'] = record['points_allowed'] / record['schedule'].length

    
    record
end

def parse_html(rankings_html)
    team_hash = {}

    teams_html = rankings_html.split(//)
    
    junk = teams_html.shift

    teams_html.each do |team_table_html|
        team_hash_entry = parse_record(team_table_html)

        team_hash[team_hash_entry['name']] = team_hash_entry
    end
    
    team_hash

end

def calculate_quality(team_hash)
    quality_hash = {}
    
    total_teams = team_hash.length

    
    offense_quality_hash = {}
    defense_quality_hash = {}
    
    team_hash.each_pair do |name, team|

        offense_quality_hash[name] = team['scoring_offense']
        defense_quality_hash[name] = team['scoring_defense']

    end
    
    # Sort by points scored, from highest to lowest:
    offense_quality_array = offense_quality_hash.sort {|a,b| b[1]<=>a[1]}

    # Sort by points allowed, from lowest to highest:
    defense_quality_array = defense_quality_hash.sort {|a,b| a[1]<=>b[1]}

    
    quality_hash = {}
    
    puts "Offense Quality Rankings:"
    
    offense_quality_array.each_index do |i|

        team_name = offense_quality_array[i][0]
        team_hash[team_name]["offense_quality"] = i

        puts "#{i+1}. #{offense_quality_array[i][0]} #{offense_quality_array[i][1]}"
    end
    
    puts "\n\n"

    
    puts "Defense Quality Rankings:"
    
    defense_quality_array.each_index do |i|
        team_name = defense_quality_array[i][0]

        team_hash[team_name]["defense_quality"] = i
        puts "#{i+1}. #{defense_quality_array[i][0]} #{defense_quality_array[i][1]}"

        # Different methods of calculating overall quality:
        
        # Sadly, this gives the best results, but it isn't allowed, as it uses average margin of victory:
        #quality_hash[team_name] = team_hash[team_name]['scoring_offense'] - team_hash[team_name]['scoring_defense']
        
        #quality_hash[team_name] = (team_hash[team_name]["defense_quality"] + team_hash[team_name]["offense_quality"]) / 2

        quality_hash[team_name] = ((total_teams - team_hash[team_name]["offense_quality"]) + (total_teams - team_hash[team_name]["defense_quality"])) / 2

    end
    
    puts "\n\n"
    
    puts "Overall Quality Rankings:"
    
    quality_array = quality_hash.sort {|a,b| b[1]<=>a[1]}

    #quality_array = quality_hash.sort {|a,b| a[1]<=>b[1]}
    previous_value = -1
    quality_array.each_index do |i|

        team_name = quality_array[i][0]
        team_quality = quality_array[i][1].to_i

        
        if team_quality == previous_value
            i -= 1
        end

        
        team_hash[team_name]["quality"] = i
        
        previous_value = team_quality

        
        puts "#{i+1}. #{quality_array[i][0]} #{quality_array[i][1]}"
    end
    
    puts "\n\n"

    
    team_hash
end

def calculate_rankings(team_hash)
    rankings_hash = {}

    total_teams = team_hash.length
    
    team_hash.each_pair do |name, team|

        # Formula: 
        #  For a win: value is increased by inverse quality differnetial 
        #  For a loss: value is decreased by quality differential
        strength_of_schedule = 0
        ranking = 0

    
        puts "#{name} #{team['win']}-#{team['loss']} (#{team['quality']})"
        
        team["schedule"].each do |game|

            
            # If a team plays an FCS squad, we rank them as the last, in terms of quality
            if team_hash.has_key? game['name']
                opponent_quality = team_hash[game["name"]]["quality"]

                fcs = false
            else
                opponent_quality = total_teams

                fcs = true
            end

            strength_of_schedule += opponent_quality

            
            quality_differential = (team["quality"] - opponent_quality).abs

            
            if fcs 
                # Punish teams mercilessly for playing an FCS squad:
                quality_differential = total_teams
            else

                if opponent_quality < team["quality"]
                    # Upset:
                    if game["win?"] && quality_differential > UPSET_COUNT

                        quality_differential /= UPSET_FACTOR
                    end
                else
                    # Upset:
                    if !game["win?"] && quality_differential > UPSET_COUNT

                        quality_differential *= UPSET_FACTOR
                    end
                end
            end
            
            # Give a boost for a road win:

            if game["away?"] && game["win?"]
                quality_differential /= ROAD_WIN_FACTOR

            end
            
            quality_differential = (total_teams - quality_differential).abs

            
            if game["win?"]
                ranking += quality_differential
            else

                ranking -= quality_differential
            end
            
            puts "\t#{game["name"]}(#{opponent_quality}) #{quality_differential} (#{ranking})"

        end
        strength_of_schedule = total_teams - (strength_of_schedule.to_f / (team["win"] + team["loss"]))

        
        # Add strength of schedule, ranking, & quality of team.  Multiple by win percentage:
        rankings_hash[name] = ((strength_of_schedule * STRENGTH_OF_SCHEDULE_FACTOR) + (ranking * RANKING_FACTOR) + ((total_teams - team['quality']) * QUALITY_FACTOR)) * (team['win'].to_f / (team['win'] + team['loss']))

    end

    rankings_hash.sort {|a,b| b[1]<=>a[1]}

end

if File.exists? "Sked2009.htm"
    rankings_html = IO.readlines("Sked2009.htm").join()

else
    rankings_html = fetch('http://www.jhowell.net/cf/scores/Sked2009.htm').body
end

@team_hash = parse_html(rankings_html)

@team_hash = calculate_quality(@team_hash)
@rankings_array = calculate_rankings(@team_hash)

puts "\n\n"

@rankings_array.each_index do |i|
    puts "#{i+1}. #{@rankings_array[i][0]} #{@team_hash[@rankings_array[i][0]]['win']}-#{@team_hash[@rankings_array[i][0]]['loss']} (#{@rankings_array[i][1]})"

end
blog comments powered by Disqus
Log In