Tuesday, September 6, 2011

Expected Win Ratio: New Statistic

Baseball is a game of numbers, but aren't most sports. Bill James invented a metric called the Pythagorean Expectation. It derives its explanation from the Pythagorean Theorem. What the equation says is, if we take how many runs a team scores, square it, then divide that number by how many runs a team scored squared plus how many runs a team gave up (squared) we will get that team's winning (or losing) percentage.
 So all in all it looks a little like this [Runs for]^2/([Runs for]^2+[Runs Against]^2).
Or more neatly

Now sabermetricans have perfected the equation, and realized that actual exponent should be around 1.86. There are many sabermetric articles about how well this is at calibrating the win expectation of a baseball team. Some teams are really good at beating the odds, meaning they win more games than their expected wins. In 2011,The San Francisco Gaints are the best team at this, they have won 6 more games than their expected winning percentage would say. Now all of this has been examined, so what does this really mean, why I am blogging about this. That is because, what if we borrowed this idea for other sports. Could we predict the winning percentage of your favorite football team?

Using a similar equation, but with Points For and Points against, allow us to start :

[Points for]^2/ ([Points For]^2 + [Points against]^2) .
Now if we optimize the Standard Error equation, being [Expected Winning - Actual winning]^2 . Our goal is to make the sum of those numbers the lowest, by adjusting the Expected winning number. And of course, the only thing that isnt constant is the exponent value. So if we optimize that, we should aquire the value of 2.50 .

So using the last 51 years of NFL data, a teams overall record can be determined by using the above equation.

This is a plot of all the data. As we would expect, it is very close to a straight line. That means that, our Estimated winning percentage is very close (in error) to the Actual winning percentage. There are a few minor flaws. The projection will rarely give a 0-16 season, but teams actually do that. But as you can see, the projection only predicted they would win 10% of their games, so still a small margin.

 So what, why does this matter you may ask.  How does this equation effect anything. It actually means a lot As I continue to examine predictors and estimates, we now can value a teams outcome. If we can predict (with in a certain error) a team's Points For, and a team's Points Against, we could predict how many games the team is going to win that season.

Now if we can use points to get wins, we could other stats to get points. How many points is a particular position responsible for, so if we add Quarterback B, who is expected to throw X amount of yards, we can contribute his X yards to Y wins. Almost in a linear fashion. So as I continue to expand the Football statistics I will refer to this expected winning ratio.

1 comment:

V for ..... said...

Few questions:
- how good of a predictor is this mid-season? what kind of sample sizes do you need to be able to make reasonable estimates on the overall performance?
- given two teams, you have two estimates for their win ratio, can you create a probability estimate that team a defeats team b? what kind of accuracy would this predictor have?