Friday, May 25, 2012


Recently, I read an article about how Human Resources Departments use numbers to screen candidates and, thus, reduced the time it takes to hire someone. If you are interested in the full article, you can read it here.  Since the movie Moneyball came out, there has been a push in the media to emphasize the importance of data analysis. Here are some important notes: the book Moneyball came out in 2003 (goes to show Americans really don’t like to read), and the following statistics are over 100 years old: Runs, RBI, Home Runs, Errors, Batting Average, and Earned Run Average]. I make this note because people were using statistics for 100 years to look at baseball players.  The media fails to mention this. We have statistics for almost everything, but it doesn’t mean they are good statistics. In the movie, Jonah Hill’s character makes the all but important comment : “We are finding an Island of Misfit toys”. His character emphasizes finding new statistics that are actually better predictors to the desired outcome (in his case wins). As much as data analysis is a science, it is actually an art as well.  Data Scientists (and statisticians) need to look outside of the issue and attempt to create new statistics that will predict the outcome better. It is not simply just finding numbers that fit, and calling it a day.
                This brings me to my next point, numbers never lie, but the people that tell them do. Here are some facts: I have been unemployed for 36 hours and our league-mate John had jumped up 10 points into 5th place. Now, hopefully by the end of this post I would have taught you how to not believe any of these.
                Yes, I have been unemployed for 36 hours, which is indeed a fact. However, before you begin to send your condolences I should tell you that in 72 hours I will be employed by a new company. I failed to apply context to the situation. I didn’t tell you knowledge that I knew, that outside of the range I provided I will actually be employed. In this problem, statisticians typically lower their sample size to include only the information that is convenient for them. Here is another example of this nature. The New York Rangers have never lost a game seven at home. Hopefully you asked how many times has that happened, [it’s 4 btw]. See, always think about “what’s outside the box”.
                John, my good friend [he also helped edit this post, so hopefully he didn’t delete this section], sent out an email that went something like this. “Guys I have gained 10 points overnight, and am now in 5th place, fear me, I’m the best [insert wrestling taunt]”.  Now John, like a good taunter, left out a few pieces of key information. He was in 5th place last week! He also had four pitchers pitching that night, so it wasn’t a surprise to see him go up 30ish strikeouts, and gain some wins. It is not a shocker that he went up 10 points. He also is in 5th, it’s not like he is winning the league. John failed to tell us what happened before, that he actually just regained his position!
                See in life, we are rarely given false numbers. They are just applied for the author’s point. Sure the unemployment rate may be declining; however, it may just be that people stopped applying for unemployment!  So the next time you open the newspaper   read news online think about how statistics are being applied, and whether there are statistics outside of that bound.

No comments: