I’m back to give some more uninformed picks! I’m currently in my office trying to get my code to recognise the large scale structure of the universe (which is easier than it sounds, but I’m finding it harder than it probably is). So I don’t quite have the time to go over last weeks picks. They seemed to do alright, my only worry was that my desire for the model to work was making me support teams I didn’t like in the hopes that the status quo was preserved. TeBOW has turned me into a monster.
This week I have added in the capacity for the model to simulate the rest of the season, which means that I can start to give percentage chances for teams to get to the playoffs. Very literally I am coding these features minutes before I put them up here so if something weird happens then blame me, but also include a bit of pity in your scorn. I had to get this out before NO/CAR! The battle of the “should be in playoffs but pretty unlucky”
Anyway, enough of the foreshadowing, lets go for the 1000th power rankings you’ve read this week!
America, statisticians and the world at large have had a pretty crappy week. What better week then to introduce my overly simplistic statistical model to attempt to predict the outcome of American Football games, TeBOW!
TrueSkill (extended) Based On Wins.
The model takes only the outcome of games that have happened and manages to calculate the rating and consistency of a given team. This allows us to do two things, firstly we can power rank the teams based on their games so far and also we can make predictions about the future games that are going to happen. Every week until the end of the season I will publish the power rankings on a Monday, and then the predictions on a Thursday.
TeBOW is so-called as not only is Tim Tebow a meme and I’m addicted to those page views, but also the model completely ignores any potentially relevant information about the performance of the team, pass yardage, interceptions, etc. All TeBOW cares about is wins no matter what, and I think this is fair to his legacy.
[Typical disclaimer: I’m British and I just like making graphs, I don’t know as much about NFL as my wild assertions might imply. I’ve played fantasy football for one year now and I nearly got beat by someone who drafted Aaron Rodgers and all kickers, so take this advice with a large helping of salt]
It is with a heavy heart that I am about to reveal the basis of my fantasy draft strategy to the 13 other members of the Edinburgh nerds fantasy football league. My squad ‘THE LEGION OF BABY BOOM’ had a troubled season last year, as I picked Eddie Lacy with the second pick of the draft as he dropped from 230 points on the 2014 season to 120 points in 2015. I also held out until the later rounds to take a Quarterback, picking Sam Bradford and Teddy Bridgewater in successive rounds. I actually remember taking Teddy and seeing pick after pick not taking Bradford thinking “God what losers, I’m going to get both of them! #1, let’s go boomers!”. Subsequently I had a circus show at Quarterback, starting at points Josh McCown, Brian Hoyer etc. If you don’t have context to anything I’ve said above and I’m just naming random millionaires then let it be known that every name I said above played as if they were deliberately trying to disappoint me. I was not the victor of “VONTASY MILLERBALL”.
Anyway, the 2015 season was a clear sign to me that I am not a great NFL scout. Going on pure feeling again is going to get me embarrassed, especially since I spend far too long in a day reading about NFL to lose so badly again. So I decided to use what I have, a huge dataset of NFL players and a love of scattergraphs and histograms to try and override my awful instincts on draft day.
What I’ve got: The fantasy record of every player playing in the NFL from 2000-2015
What I’m going to do with it: Dump a load of graphs which attempt to make the readers of this blog win their fantasy league*, GUARANTEED**
*Assuming NFL.com Classic scoring
**The attempt is guaranteed, nothing else
My gut instinct tells me that NFL running backs are some of the most poorly treated athletes in the world of sport. The rules against hurting running backs are significantly less strict than those for wide receivers or quarterbacks which leads to a significantly larger amount of career ending injuries. Teams know the fragility of running backs and are less inclined to offer them guaranteed money on their contract (money which will be given even in the case that a player cannot keep playing due to an injury) which significantly lowers the career earnings of an unlucky running back. Further to that, coaches treat running backs as expendable due to the simplicity of their task and will often drop an injured one for a healthier model, which due to the pyramid scheme nature of the NFL there will always be. Given enough data and enough time I would like to prove all of the above is true.
However for this post, I want to show that a running back’s age affects their ability to play in the league. Not only that as a running back gets older they are less likely to get a job, but simply being on the wrong side of 30 will dramatically reduce their chance of having a job.
What I have: A database of all players currently active in the league, and a historic database of all drafted players.
What I’m going to do with it: See that running backs over 30 are disproportionately cut from NFL rosters compared to other skill positions.
Do you think if you flipped a coin in a mint, it would show heads more than tails? Imagine if we set up a small coin-stadium in or adjacent to the mint where the coin was made, where other coins would sit around watching the coin get flipped. Say we flipped the coin outside of the stadium first a bunch of times and showed that it was relatively 50/50 whether it was going to be heads or tails, but then we went back to this mint-stadium and flipped the coin 3,879 times, and it turned up heads 2,219 times. With a simple statistical test, you can show that the probability of a 50/50 coin giving this result in the stadium is 0.000000000256%.
Football is not a coin. However every team – no matter how good or bad – plays 16 games in the regular season: 8 of those at their own stadium and 8 of those at an opponents stadium, so a good team will play at home as much as a bad team will. Yet when you run through the stats the ‘home field advantage’, i.e that the home team are more likely to win than the away team, is more statistically significant () than the detection of the Higgs boson ().
What I’ve got: 14 years of regular season NFL data (2000-2014) – a few thousand games, half a million plays.
What I’m going to do with it: Try and find which bits of a football game are affected by ‘home field advantage’ in a (fairly) rigorous manner.
[Disclaimer: I’m British and trying to talk about the NFL, so it’s pretty likely I’m going to sound like either an idiot or an alien while trying to describe what’s going on here, my only request is that you send abuse using the anonymous field at the bottom which goes straight to my email instead of the comment box which everyone can see]
Imagine walking down the street and someone with a clipboard and a bored expression asks you the question “How many glasses of water did you have in the last week?”. You probably don’t really know the answer, and the person asking doesn’t really give too much of a crap. Maybe you could guess at any number between 30 and 40 glasses of water with an equal amount of belief, but you have to choose a number – are you just as likely to choose 32 as 35?
Maybe not, and in the NFL when the guy with the ball gets tackled or stopped at the end of a run and the officials only get a few seconds and a compromised view to decide where it stops, Will every yard line be treated equally?
What I’ve got: A spreadsheet containing every single play run in the NFL from 2000-2014 (500,000 in all)
What I’m going to do with it: Show that the referees subconsciously change the outcome of a play based on where the painted lines are on a field, and subsequently show that it doesn’t matter.