Rank Your Own Damn Teams

football poll bcs

Sat Oct 31 13:46:11 -0700 2009

I think almost all college football fans would agree that the BCS is, if nothing else, a strange system. There are human polls, computer rankings, & lots and lots of money. After the craziness that put Iowa at the top of the computers (which I agree with, I have to say), I started reading about some of these mysterious computers. Frankly, some of them are odd, as they start with preseason "power rankings," which are basically guessing (the Billingsley model actually punishes teams for not performing based on how he guessed they would perform). I was wondering if it was possible to produce a computer program that didn't involve any initial seeding and could still produce reasonable rankings.

The basic formula came to me in a dream and I coded it up this morning. I use two variables to calculate my rankings:

  1. Scoring Defense - the average number of points allowed in a game
  2. Scoring Offense - the average number of points scored in a game

Teams are then sorted according to these two statistics. A total quality score is assigned based on the average of these two numbers. As an example, Texas ranks #1 in scoring offense and #9 in scoring defense. So, to calculate their total quality score, we'd do the following:

((120-0) + (120-8)) / 2

120 is the total number of teams and it's 0 and 8 b/c of the way arrays are stored on computers. Anyway, this gives them an average score of 116, which is the highest in the country. Thus, their quality score is a 0 (which is good, the lower the quality score, the better).

So, having calculated a team's quality, we have three components that determine a ranking:

  1. Overall Quality - the total number of teams minus the quality score calculated above.
  2. Strength of Schedule - the total number of teams minus the average opponent quality.
  3. Win/Loss Ranking - the weirdest of the three statistics. For this statistic, we calculate a quality differential for each game a school plays: this is the absolute difference b/t the quality scores of each team. We then adjust this score up slightly for a road win and if it is an upset win (which is defined as beating a team w/ a 20 point higher quality score) and down slightly for an upset loss (which is defined as losing to a team w/ a 20 point higher quality score). After adjustment, the difference between the total number of teams and this adjusted quality differential is subtracted from the ranking in the event of a loss and added in the event of a win.

After having calculated these three values, we weight them. I multiple the overall quality by 1.25, the strength of schedule by 1.5, and the win/loss ranking by 0.5. Reducing the value of the ranking resulted from the fact that very weird things happen when you treat them equally, esp. w/r/t teams that have good win/loss ratios against evenly matched teams but who aren't very high quality themselves. Idaho is the perfect example of this: they have a quality score of 65 and play teams, primarily, with scores b/t 70-100. These wins look good to our ranking calculations but not when you factor in strength schedule. Therefore, we have to rank the strength of schedule high and the win/loss ranking low (but, again, the numbers give weird results w/o the win/loss ranking in the calculations).

I've also settled on these weighting values because after exploring different combinations of weights, I found that these values produce the best Top 25 both at the bottom and the top. Under different combinations, teams like Troy and Idaho would appear higher ranked than teams like Miami and Navy, who have played slightly higher quality teams overall.

Here are the Week 6 Top 25 rankings from my program:

  1. Iowa 8-0 (580.919421487603)
  2. Alabama 8-0 (537.909090909091)
  3. Florida 7-0 (445.373376623377)
  4. Cincinnati 7-0 (440.402597402597)
  5. Texas 7-0 (429.279220779221)
  6. Texas Christian 7-0 (416.37012987013)
  7. Boise State 7-0 (415.217532467532)
  8. Pittsburgh 7-1 (386.907102272727)
  9. Georgia Tech 7-1 (376.955965909091)
  10. Southern California 6-1 (361.207235621521)
  11. Louisiana State 6-1 (338.557513914657)
  12. Oregon 6-1 (308.897031539889)
  13. Houston 6-1 (302.898903693709)
  14. Utah 6-1 (280.207792207792)
  15. Penn State 7-1 (278.800852272727)
  16. West Virginia 6-1 (271.038033395176)
  17. Virginia Tech 5-2 (269.174860853432)
  18. Central Michigan 7-1 (256.328267045455)
  19. Notre Dame 5-2 (256.321892393321)
  20. South Carolina 6-2 (249.480681818182)
  21. Oklahoma State 6-1 (242.455658627087)
  22. Brigham Young 6-2 (230.034375)
  23. Ohio State 6-2 (224.322443181818)
  24. Miami (Florida) 5-2 (222.708719851577)
  25. Navy 6-2 (217.938920454545)

Also, here's the Ruby code if you want to play around with the weights:



def fetch(uri_str, limit = 10)
    require 'uri'

    require 'net/http'

  raise ArgumentError, 'HTTP redirect too deep' if limit == 0

  response = Net::HTTP.get_response(URI.parse(uri_str))

  case response
  when Net::HTTPSuccess     then response
  when Net::HTTPRedirection then fetch(response['location'], limit - 1)


def parse_record(team_table_html)

    record = {}
    record['points_scored'] = 0

    record['points_allowed'] = 0
    record['schedule'] = []

    record['win'] = 0
    record['loss'] = 0

    html_lines = team_table_html.split(/\n/)

    # Remove the lines we don't need (last line, opening of table line):

    junk = html_lines.shift
    junk = html_lines.shift

    junk = html_lines.pop
    junk = ""

    # Parse out the name:

    record['name'] = html_lines.shift.gsub(/<(.|\n)*?>/,"").sub(/\s\([A-Z]+\)/, "").chomp

    record['name'] = record['name'].sub(" (Big 12)","").sub(" (Big Ten)","").sub(" (Pac 10)","").sub(" (Big East)","").sub(" (Sun Belt)","").sub(" (Independent)","")

    html_lines.each do |line|
        line = line.gsub(" align=\"right\"", "").gsub("", "\t").gsub!(/<(.|\n)*?>/,"")

        line_array = line.split(/\t/)
        if line_array[4] != "W" and line_array[4] != "L"

        record['points_scored'] += line_array[5].to_f

        record['points_allowed'] += line_array[6].to_f
        game_hash = {}

        game_hash['name'] = line_array[3].sub(/^\*/,"").chomp

        game_hash['away?'] = (line_array[2].chomp == "@")

        game_hash['win?'] = (line_array[4].chomp == "W")

        if game_hash['win?']
            record['win'] += 1

            record['loss'] += 1


    record['scoring_offense'] = record['points_scored'] / record['schedule'].length

    record['scoring_defense'] = record['points_allowed'] / record['schedule'].length


def parse_html(rankings_html)
    team_hash = {}

    teams_html = rankings_html.split(//)
    junk = teams_html.shift

    teams_html.each do |team_table_html|
        team_hash_entry = parse_record(team_table_html)

        team_hash[team_hash_entry['name']] = team_hash_entry


def calculate_quality(team_hash)
    quality_hash = {}
    total_teams = team_hash.length

    offense_quality_hash = {}
    defense_quality_hash = {}
    team_hash.each_pair do |name, team|

        offense_quality_hash[name] = team['scoring_offense']
        defense_quality_hash[name] = team['scoring_defense']

    # Sort by points scored, from highest to lowest:
    offense_quality_array = offense_quality_hash.sort {|a,b| b[1]<=>a[1]}

    # Sort by points allowed, from lowest to highest:
    defense_quality_array = defense_quality_hash.sort {|a,b| a[1]<=>b[1]}

    quality_hash = {}
    puts "Offense Quality Rankings:"
    offense_quality_array.each_index do |i|

        team_name = offense_quality_array[i][0]
        team_hash[team_name]["offense_quality"] = i

        puts "#{i+1}. #{offense_quality_array[i][0]} #{offense_quality_array[i][1]}"
    puts "\n\n"

    puts "Defense Quality Rankings:"
    defense_quality_array.each_index do |i|
        team_name = defense_quality_array[i][0]

        team_hash[team_name]["defense_quality"] = i
        puts "#{i+1}. #{defense_quality_array[i][0]} #{defense_quality_array[i][1]}"

        # Different methods of calculating overall quality:
        # Sadly, this gives the best results, but it isn't allowed, as it uses average margin of victory:
        #quality_hash[team_name] = team_hash[team_name]['scoring_offense'] - team_hash[team_name]['scoring_defense']
        #quality_hash[team_name] = (team_hash[team_name]["defense_quality"] + team_hash[team_name]["offense_quality"]) / 2

        quality_hash[team_name] = ((total_teams - team_hash[team_name]["offense_quality"]) + (total_teams - team_hash[team_name]["defense_quality"])) / 2

    puts "\n\n"
    puts "Overall Quality Rankings:"
    quality_array = quality_hash.sort {|a,b| b[1]<=>a[1]}

    #quality_array = quality_hash.sort {|a,b| a[1]<=>b[1]}
    previous_value = -1
    quality_array.each_index do |i|

        team_name = quality_array[i][0]
        team_quality = quality_array[i][1].to_i

        if team_quality == previous_value
            i -= 1

        team_hash[team_name]["quality"] = i
        previous_value = team_quality

        puts "#{i+1}. #{quality_array[i][0]} #{quality_array[i][1]}"
    puts "\n\n"


def calculate_rankings(team_hash)
    rankings_hash = {}

    total_teams = team_hash.length
    team_hash.each_pair do |name, team|

        # Formula: 
        #  For a win: value is increased by inverse quality differnetial 
        #  For a loss: value is decreased by quality differential
        strength_of_schedule = 0
        ranking = 0

        puts "#{name} #{team['win']}-#{team['loss']} (#{team['quality']})"
        team["schedule"].each do |game|

            # If a team plays an FCS squad, we rank them as the last, in terms of quality
            if team_hash.has_key? game['name']
                opponent_quality = team_hash[game["name"]]["quality"]

                fcs = false
                opponent_quality = total_teams

                fcs = true

            strength_of_schedule += opponent_quality

            quality_differential = (team["quality"] - opponent_quality).abs

            if fcs 
                # Punish teams mercilessly for playing an FCS squad:
                quality_differential = total_teams

                if opponent_quality < team["quality"]
                    # Upset:
                    if game["win?"] && quality_differential > UPSET_COUNT

                        quality_differential /= UPSET_FACTOR
                    # Upset:
                    if !game["win?"] && quality_differential > UPSET_COUNT

                        quality_differential *= UPSET_FACTOR
            # Give a boost for a road win:

            if game["away?"] && game["win?"]
                quality_differential /= ROAD_WIN_FACTOR

            quality_differential = (total_teams - quality_differential).abs

            if game["win?"]
                ranking += quality_differential

                ranking -= quality_differential
            puts "\t#{game["name"]}(#{opponent_quality}) #{quality_differential} (#{ranking})"

        strength_of_schedule = total_teams - (strength_of_schedule.to_f / (team["win"] + team["loss"]))

        # Add strength of schedule, ranking, & quality of team.  Multiple by win percentage:
        rankings_hash[name] = ((strength_of_schedule * STRENGTH_OF_SCHEDULE_FACTOR) + (ranking * RANKING_FACTOR) + ((total_teams - team['quality']) * QUALITY_FACTOR)) * (team['win'].to_f / (team['win'] + team['loss']))


    rankings_hash.sort {|a,b| b[1]<=>a[1]}


if File.exists? "Sked2009.htm"
    rankings_html = IO.readlines("Sked2009.htm").join()

    rankings_html = fetch('http://www.jhowell.net/cf/scores/Sked2009.htm').body

@team_hash = parse_html(rankings_html)

@team_hash = calculate_quality(@team_hash)
@rankings_array = calculate_rankings(@team_hash)

puts "\n\n"

@rankings_array.each_index do |i|
    puts "#{i+1}. #{@rankings_array[i][0]} #{@team_hash[@rankings_array[i][0]]['win']}-#{@team_hash[@rankings_array[i][0]]['loss']} (#{@rankings_array[i][1]})"

blog comments powered by Disqus
Log In