America, statisticians and the world at large have had a pretty crappy week. What better week then to introduce my overly simplistic statistical model to attempt to predict the outcome of American Football games, TeBOW!
TrueSkill (extended) Based On Wins.
The model takes only the outcome of games that have happened and manages to calculate the rating and consistency of a given team. This allows us to do two things, firstly we can power rank the teams based on their games so far and also we can make predictions about the future games that are going to happen. Every week until the end of the season I will publish the power rankings on a Monday, and then the predictions on a Thursday.
TeBOW is so-called as not only is Tim Tebow a meme and I’m addicted to those page views, but also the model completely ignores any potentially relevant information about the performance of the team, pass yardage, interceptions, etc. All TeBOW cares about is wins no matter what, and I think this is fair to his legacy.
As this is the first of the posts I will briefly explain how TeBOW calculates rankings and win percentages, and then I will get into those juicy picks! If you don’t care about all that nerd stuff then skip between these two lines.
What the heck is TeBOW?
TeBOW is based on a system developed by Microsoft named TrueSkill for the multiplayer game Halo, where they were looking for a way to try and infer skill of players online which would help them to create online matchups which were fair, i.e. that both teams had a similar probability of winning.
The system is a generalisation of Elo, which FiveThirtyEight uses to predict NFL games in a similar way to us. Elo was designed for chess games, and represents the skill of a player by a single number, then for a matchup of two people the difference between these two numbers will spit out the probabilities of either player winning. Then based on whichever of them did win, the Elo score is updated. That is an overly simplistic explanation of what is going on, so for more see here.
There are several drawbacks to this model, based on its simplifying assumptions:
- Elo assumes (for mathematical simplicity) that each player is as consistent as one another, i.e. that two players’ performance will deviate similarly from their skill in a given game.
- Elo does not allow for uncertainty in skill rating, the model assumes that the Elo given is absolutely accurate, and any deviation will be due to performance deviating from skill
TrueSkill, on the other hand, manages to overcome these hurdles by firstly modelling the uncertainty we have in a players’ skill, this can be different for different players, for example if a team has played many games and has generally found its level, being beaten by teams that TrueSkill thinks it should and beating the teams TrueSkill thinks, then this team will have a low deviation in performance. If one week a team beats the Patriots and the next week loses to the Browns, it will have a large deviation in performance and so TrueSkill is able to model that.
The performance of a team on a given day is then drawn from the TrueSkill distribution, and then this distribution is then the seed of another distribution which predicts the probability of winning or losing. Then the individual skills and skill-deviations of the teams are updated following a complicated mess of equations to move on to the next matchup.
Anyway that was a lot of words, let me illustrate a bit more with a graph:
This is the TeBOW rankings for the 2015 season, for the Carolina Panthers, the New England Patriots, the Denver Broncos, and the Cleveland Browns. I want to go over a few features of this graph to show you that it is tapping into some kind of truth. [quick explanation of the graph: the solid line for a team indicates the most likely value of the TeBOW, where the translucent fill around that line indicates the uncertainty in knowledge and form about the TeBOW, so if you have two teams within each others’ fill, they are likely to have an evenly matched game]
First up, I want to point out that TeBOW includes the TeBOW ratings of the previous season to rank teams going into the next season, which makes sense why the rankings go 1.Den/NE, 3. CAR, 4. CLE. That is simply a rough comparison of their 2014 records. In fact the extension we have made to TrueSkill is that over the offseason we allow the model to increase the variance and regress the TeBOW ratings slightly to the mean. We find this improves our performance to predict the outcomes of games the model has not seen the outcome of.
We then see that in the first 7 games, DEN/NE/CAR are all undefeated, and so their skills grow accordingly. Denver is the first to lose in their 8th game (to the Colts, who had an unimpressive season) and subsequently lose again the next game which leads to a decent dip. New England then has a large dip in their 11th game (look between the two purple horizontal lines) which is accompanied by a significant Denver gain – this is in a week 12 matchup where Denver upset the odds and beat the Patriots, giving them their first loss. As we get to the end of the season we see the Patriots ravaged by injuries and losses regress to a TrueSkill below the eventual champions, the Broncos. Then the actual Superbowl matchup CAR/DEN is well within the margin of error, which I’d say is fair for what the game ended up being.
Also important to note that TeBOW is fairly confident that any of those three teams would be able to beat the Browns at any point during the season, which is a useful sanity check on the model.
What’s wrong with TeBOW?
Oh heck, a lot. I feel inclined to explain this as I will be giving predictions below, but if you were to overestimate your confidence in these predictions you might do something silly like bet money in them and then I’d be responsible for potentially losing you money. Feel free to do that but please do it after taking these grains of salt:
- We completely ignore actual game performance – this model sees one thing from the outcome of a game: win or loss. Nothing like “oh man it came down to the wire, there was a missed field goal but they looked good all game!” which is useful information that the bookmakers have. A better model (which I am working on, and which the bookmakers probably have) would include this information.
- We ignore everything else too – A lot of football comes down to what happens between games: injury reports, free agent transfers, roster decisions. TeBOW doesn’t see any of that.
- We ignore even more stuff than all that – Home/Away, weather, playoff implications, all ignored. All TeBOW sees is WINS.
TeBOW’s Week 10 Picks
Here you go, lets get picking!
|cleveland browns @ baltimore ravens||CLE: 23.1% / BAL: 76.9%|
|chicago bears @ tampa bay buccaneers||CHI: 51.0% / TB: 49.0%|
|atlanta falcons @ philadelphia eagles||ATL: 58.8% / PHI: 41.2%|
|los angeles rams @ new york jets||LA: 46.2% / NYJ: 53.8%|
|green bay packers @ tennessee titans||GB: 70.0% / TEN: 30.0%|
|denver broncos @ new orleans saints||DEN: 60.2% / NO: 39.8%|
|kansas city chiefs @ carolina panthers||KC: 65.3% / CAR: 34.7%|
|houston texans @ jacksonville jaguars||HOU: 80.5% / JAC: 19.5%|
|minnesota vikings @ washington||MIN: 51.3% / WAS: 48.7%|
|miami dolphins @ san diego chargers||MIA: 56.1% / SD: 43.9%|
|dallas cowboys @ pittsburgh steelers||DAL: 56.3% / PIT: 43.7%|
|san francisco 49ers @ arizona cardinals||SF: 20.7% / ARI: 79.3%|
|seattle seahawks @ new england patriots||SEA: 46.0% / NE: 54.0%|
|cincinnati bengals @ new york giants||CIN: 53.8% / NYG: 46.2%|
Which one of these are stupid as heck? Is this the week for the Browns first win? Let me know below or at @JoeyMFaulkner