I think almost all college football fans would agree that the BCS is, if nothing else, a strange system. There are human polls, computer rankings, & lots and lots of money. After the craziness that put Iowa at the top of the computers (which I agree with, I have to say), I started reading about some of these mysterious computers. Frankly, some of them are odd, as they start with preseason "power rankings," which are basically guessing (the Billingsley model actually punishes teams for not performing based on how he guessed they would perform). I was wondering if it was possible to produce a computer program that didn't involve any initial seeding and could still produce reasonable rankings.
The basic formula came to me in a dream and I coded it up this morning. I use two variables to calculate my rankings:
- Scoring Defense - the average number of points allowed in a game
- Scoring Offense - the average number of points scored in a game
Teams are then sorted according to these two statistics. A total quality score is assigned based on the average of these two numbers. As an example, Texas ranks #1 in scoring offense and #9 in scoring defense. So, to calculate their total quality score, we'd do the following:
((120-0) + (120-8)) / 2
120 is the total number of teams and it's 0 and 8 b/c of the way arrays are stored on computers. Anyway, this gives them an average score of 116, which is the highest in the country. Thus, their quality score is a 0 (which is good, the lower the quality score, the better).
So, having calculated a team's quality, we have three components that determine a ranking:
- Overall Quality - the total number of teams minus the quality score calculated above.
- Strength of Schedule - the total number of teams minus the average opponent quality.
- Win/Loss Ranking - the weirdest of the three statistics. For this statistic, we calculate a quality differential for each game a school plays: this is the absolute difference b/t the quality scores of each team. We then adjust this score up slightly for a road win and if it is an upset win (which is defined as beating a team w/ a 20 point higher quality score) and down slightly for an upset loss (which is defined as losing to a team w/ a 20 point higher quality score). After adjustment, the difference between the total number of teams and this adjusted quality differential is subtracted from the ranking in the event of a loss and added in the event of a win.
After having calculated these three values, we weight them. I multiple the overall quality by 1.25, the strength of schedule by 1.5, and the win/loss ranking by 0.5. Reducing the value of the ranking resulted from the fact that very weird things happen when you treat them equally, esp. w/r/t teams that have good win/loss ratios against evenly matched teams but who aren't very high quality themselves. Idaho is the perfect example of this: they have a quality score of 65 and play teams, primarily, with scores b/t 70-100. These wins look good to our ranking calculations but not when you factor in strength schedule. Therefore, we have to rank the strength of schedule high and the win/loss ranking low (but, again, the numbers give weird results w/o the win/loss ranking in the calculations).
I've also settled on these weighting values because after exploring different combinations of weights, I found that these values produce the best Top 25 both at the bottom and the top. Under different combinations, teams like Troy and Idaho would appear higher ranked than teams like Miami and Navy, who have played slightly higher quality teams overall.
Here are the Week 6 Top 25 rankings from my program:
- Iowa 8-0 (580.919421487603)
- Alabama 8-0 (537.909090909091)
- Florida 7-0 (445.373376623377)
- Cincinnati 7-0 (440.402597402597)
- Texas 7-0 (429.279220779221)
- Texas Christian 7-0 (416.37012987013)
- Boise State 7-0 (415.217532467532)
- Pittsburgh 7-1 (386.907102272727)
- Georgia Tech 7-1 (376.955965909091)
- Southern California 6-1 (361.207235621521)
- Louisiana State 6-1 (338.557513914657)
- Oregon 6-1 (308.897031539889)
- Houston 6-1 (302.898903693709)
- Utah 6-1 (280.207792207792)
- Penn State 7-1 (278.800852272727)
- West Virginia 6-1 (271.038033395176)
- Virginia Tech 5-2 (269.174860853432)
- Central Michigan 7-1 (256.328267045455)
- Notre Dame 5-2 (256.321892393321)
- South Carolina 6-2 (249.480681818182)
- Oklahoma State 6-1 (242.455658627087)
- Brigham Young 6-2 (230.034375)
- Ohio State 6-2 (224.322443181818)
- Miami (Florida) 5-2 (222.708719851577)
- Navy 6-2 (217.938920454545)
Also, here's the Ruby code if you want to play around with the weights:
UPSET_COUNT = 20 UPSET_FACTOR = 1.1 ROAD_WIN_FACTOR = 1.1 QUALITY_FACTOR = 1.25 RANKING_FACTOR = 0.5 STRENGTH_OF_SCHEDULE_FACTOR = 1.5 def fetch(uri_str, limit = 10) require 'uri' require 'net/http' raise ArgumentError, 'HTTP redirect too deep' if limit == 0 response = Net::HTTP.get_response(URI.parse(uri_str)) case response when Net::HTTPSuccess then response when Net::HTTPRedirection then fetch(response['location'], limit - 1) else response.error! end end def parse_record(team_table_html) record = {} record['points_scored'] = 0 record['points_allowed'] = 0 record['schedule'] = [] record['win'] = 0 record['loss'] = 0 html_lines = team_table_html.split(/\n/) # Remove the lines we don't need (last line, opening of table line): junk = html_lines.shift junk = html_lines.shift junk = html_lines.pop junk = "" # Parse out the name: record['name'] = html_lines.shift.gsub(/<(.|\n)*?>/,"").sub(/\s\([A-Z]+\)/, "").chomp record['name'] = record['name'].sub(" (Big 12)","").sub(" (Big Ten)","").sub(" (Pac 10)","").sub(" (Big East)","").sub(" (Sun Belt)","").sub(" (Independent)","") html_lines.each do |line| line = line.gsub(" align=\"right\"", "").gsub("", "\t").gsub!(/<(.|\n)*?>/,"") line_array = line.split(/\t/) if line_array[4] != "W" and line_array[4] != "L" next end record['points_scored'] += line_array[5].to_f record['points_allowed'] += line_array[6].to_f game_hash = {} game_hash['name'] = line_array[3].sub(/^\*/,"").chomp game_hash['away?'] = (line_array[2].chomp == "@") game_hash['win?'] = (line_array[4].chomp == "W") if game_hash['win?'] record['win'] += 1 else record['loss'] += 1 end record['schedule'].push(game_hash) end record['scoring_offense'] = record['points_scored'] / record['schedule'].length record['scoring_defense'] = record['points_allowed'] / record['schedule'].length record end def parse_html(rankings_html) team_hash = {} teams_html = rankings_html.split(/^ /) junk = teams_html.shift teams_html.each do |team_table_html| team_hash_entry = parse_record(team_table_html) team_hash[team_hash_entry['name']] = team_hash_entry end team_hash end def calculate_quality(team_hash) quality_hash = {} total_teams = team_hash.length offense_quality_hash = {} defense_quality_hash = {} team_hash.each_pair do |name, team| offense_quality_hash[name] = team['scoring_offense'] defense_quality_hash[name] = team['scoring_defense'] end # Sort by points scored, from highest to lowest: offense_quality_array = offense_quality_hash.sort {|a,b| b[1]<=>a[1]} # Sort by points allowed, from lowest to highest: defense_quality_array = defense_quality_hash.sort {|a,b| a[1]<=>b[1]} quality_hash = {} puts "Offense Quality Rankings:" offense_quality_array.each_index do |i| team_name = offense_quality_array[i][0] team_hash[team_name]["offense_quality"] = i puts "#{i+1}. #{offense_quality_array[i][0]} #{offense_quality_array[i][1]}" end puts "\n\n" puts "Defense Quality Rankings:" defense_quality_array.each_index do |i| team_name = defense_quality_array[i][0] team_hash[team_name]["defense_quality"] = i puts "#{i+1}. #{defense_quality_array[i][0]} #{defense_quality_array[i][1]}" # Different methods of calculating overall quality: # Sadly, this gives the best results, but it isn't allowed, as it uses average margin of victory: #quality_hash[team_name] = team_hash[team_name]['scoring_offense'] - team_hash[team_name]['scoring_defense'] #quality_hash[team_name] = (team_hash[team_name]["defense_quality"] + team_hash[team_name]["offense_quality"]) / 2 quality_hash[team_name] = ((total_teams - team_hash[team_name]["offense_quality"]) + (total_teams - team_hash[team_name]["defense_quality"])) / 2 end puts "\n\n" puts "Overall Quality Rankings:" quality_array = quality_hash.sort {|a,b| b[1]<=>a[1]} #quality_array = quality_hash.sort {|a,b| a[1]<=>b[1]} previous_value = -1 quality_array.each_index do |i| team_name = quality_array[i][0] team_quality = quality_array[i][1].to_i if team_quality == previous_value i -= 1 end team_hash[team_name]["quality"] = i previous_value = team_quality puts "#{i+1}. #{quality_array[i][0]} #{quality_array[i][1]}" end puts "\n\n" team_hash end def calculate_rankings(team_hash) rankings_hash = {} total_teams = team_hash.length team_hash.each_pair do |name, team| # Formula: # For a win: value is increased by inverse quality differnetial # For a loss: value is decreased by quality differential strength_of_schedule = 0 ranking = 0 puts "#{name} #{team['win']}-#{team['loss']} (#{team['quality']})" team["schedule"].each do |game| # If a team plays an FCS squad, we rank them as the last, in terms of quality if team_hash.has_key? game['name'] opponent_quality = team_hash[game["name"]]["quality"] fcs = false else opponent_quality = total_teams fcs = true end strength_of_schedule += opponent_quality quality_differential = (team["quality"] - opponent_quality).abs if fcs # Punish teams mercilessly for playing an FCS squad: quality_differential = total_teams else if opponent_quality < team["quality"] # Upset: if game["win?"] && quality_differential > UPSET_COUNT quality_differential /= UPSET_FACTOR end else # Upset: if !game["win?"] && quality_differential > UPSET_COUNT quality_differential *= UPSET_FACTOR end end end # Give a boost for a road win: if game["away?"] && game["win?"] quality_differential /= ROAD_WIN_FACTOR end quality_differential = (total_teams - quality_differential).abs if game["win?"] ranking += quality_differential else ranking -= quality_differential end puts "\t#{game["name"]}(#{opponent_quality}) #{quality_differential} (#{ranking})" end strength_of_schedule = total_teams - (strength_of_schedule.to_f / (team["win"] + team["loss"])) # Add strength of schedule, ranking, & quality of team. Multiple by win percentage: rankings_hash[name] = ((strength_of_schedule * STRENGTH_OF_SCHEDULE_FACTOR) + (ranking * RANKING_FACTOR) + ((total_teams - team['quality']) * QUALITY_FACTOR)) * (team['win'].to_f / (team['win'] + team['loss'])) end rankings_hash.sort {|a,b| b[1]<=>a[1]} end if File.exists? "Sked2009.htm" rankings_html = IO.readlines("Sked2009.htm").join() else rankings_html = fetch('http://www.jhowell.net/cf/scores/Sked2009.htm').body end @team_hash = parse_html(rankings_html) @team_hash = calculate_quality(@team_hash) @rankings_array = calculate_rankings(@team_hash) puts "\n\n" @rankings_array.each_index do |i| puts "#{i+1}. #{@rankings_array[i][0]} #{@team_hash[@rankings_array[i][0]]['win']}-#{@team_hash[@rankings_array[i][0]]['loss']} (#{@rankings_array[i][1]})" end blog comments powered by Disqus