Friday, November 23, 2012

If Nate Silver is a Wizard

If Nate Silver is a wizard, I'm calling out all the squibs. Now to clarify, that is a loaded statement; so dont panic if you do not get all of that at once.

A few weeks ago, Nate Silver appeared on several television shows speaks the gospel of science. Since a lot of people can't explain math, someone created this website "Is Nate Silver a witch?". Wizard/Witch or whatever, Mr. Silver has a nacht of being very correct when it comes to predicting and forecasting. His recent publication explains the troubles and joys of forecasting. Inside The Signal and The Noise Silver complains about fakers,posers or as I like to call them squibs.

A squib (apparently a bad word to say in the Harry Potter Universe) is a wizard born without powers. Now I'll adapt this for my argument  I'll define a squib as a half-wizard / half-human. Basically a weak wizard who walks amongst the muggles thinking he/she holds the strongest magical power. I'm calling those people out.



I recently read this article that attempts to create journalism from pasted together statistics. If one were to make it to the bottom of the article, he/she would read "There is no statistical proof that Fergie time applies specifically to Man Utd." This of course, after the author spoke highly of all the statistics she had compiled, even citing the leader in sports data analysis,Opta.  This along with "On Rainy Days he is five for six when batting against a blonde pitcher on tuesday", these novel statistics ruin the magic. When one searches large data, it is quite easy to find patterns that tell a story, but there is a difference between correlation and causation. A squib finds statistics to tell the story, instead of using statistics to tell the story. This subtle difference is actually not subtle at all.

Statisticians, Data Scientist and Mathematicians are not wizards, they are scientists. Statisticians (used loosely to defined those previously mentioned) use math to solve trends and predictions. They are the modern physicists,chemists and biologists. Like physicists before them, Statisticians are figuring out 'why the apple falls from the tree'. These solutions have repeatability and can be applied with the right wizard statistical training. That training is not something to take lightly.

Squibs claim to be wizards but are merely just muggles pretending to wizards. They tell the world it will rain tomorrow, but nothing in the modeling says it should. They are statisticians that leave extra variables in their models because it 'looks cool'. They don't check the assumptions when applying linear regression, they dont even make sure their errors are normally distributed. These are the people that use statistics without the theory. They don't use training sets, but test their theories in production. These squibs are so determined in their ways, and will make crazy predictions with no real science. Squibs are those friends who have no cooking experience but claim to make the best pumpkin pie; they are the runner on the track with the speed dome to help with wind resistance. Squibs drive the 92 civic with the after market spoiler, a squib never played football but has £200 spikes. Everyone knows a squib in one way or another, but I'm calling out the statistical squibs.

Why hate the squibs? Squibs preach words they dont know, methods they truly never learned. Like running, statistics requires the proper training. If one were to just try and run a marathon they surely would pull a muscle, bad statistics are the pulled muscle. Sports commentaries are pulled muscles, explaining to the audience that it is risky to go for on fourth and one, when actually the odds are equal.

I'm not hating on muggle-borns (people without wizard powers, but acquire it). People that respect statistics, the people that know a 2:45 marathon is impressive but never ran a marathon. The people that know Nate Silver is actually not a wizard, but a smart man who's results have been proven to work. These people respect statistics, but dont claim to make outrageous predictions.'Using my Model the GDP will go up 150%", it says so because of "wavy hands". Muggle-born know that people who make failed predictions have to explain their errors. They know that when they hear 'its going to be a close election', but the polls show a huge blow out, that the pundits need to explain their error.



Thursday, October 11, 2012

A Presidential Sport Debate | What if?

During the 9pm Est there were several things on television  some including the Yankees & Oriels game , the Vice Presidential debate and the [last] season premier of the Jersey. These of course are just to list a few. While reading some things on social media, I saw two independent statements : "Politics are boring me, I'm switching to baseball" and "Presidential Debates had 10.5 million tweets, the Vice-Presidential Debates had 3.5". This made me think, what if we elected the Owner of a baseball team to the Oval office (oh wait we kinda did with George Bush, but I digress). 

Why I would vote for Baseball Owner

North America Free Trade Agreement [NAFTA] : Have you heard of the Toronto Blue Jays, the are part of  Major League Baseball. Trades happen between the Blue Jays and all other teams at no extra cost. Players are often sent to hell  traded to the Blue Jays, not neither team is forced to pay extra.

Immigration Policy - Roughly 28% of baseball players are of Hispanic Origin. Clearly Major League Owners understand that Hispanic people provide great talents to this country and have adapted. 28% is a nice number, sure we may want 100% and not care, but hey, these guys are trying give them a break.

Cuba - 18 players are currently Cuban, and so is Yoenis Cespedes. The Managers are either really good at smuggling these people over, or have a pretty good relationship with Cuba.

Patriotism - For a sport that is known as "America's Pastime", clearly shows that the managers love America.

Laws - Contracts are legally binding, so the owners are really good at delegating laws [perfect exercise for the Supreme Court]

Striking and Union rights - Its been nearly 20 years since the last MLB strike, that is longer than the NFL, NHL, NBA. The MLB has learned how to negotiate between their employees and their cash flow.

Welfare - Pittsburgh Pirates, need I say more?

Foreign Policy - Japanese players, we are best buds with Asia

Senior Citizens - First off most games are still played on the radio, thus old people LOVE baseball teams.  Have you heard of Chipper Jones  dudes mad old by the Braves still loved him. All the 65+ will recall those Babe Ruth days, and remember how awesome baseball is. They also hire senior citizens like Davey Johnson.

Racism? Jackie Robinson broke the colour barrier before the American public did.

Spying on others - Have you seen those aerial views from those blimps

Women - Softball?

Ok Kinda Serious :
See a baseball owner has many types of approaches to take to win the World Series. They can spend a ton of money and win a ton of titles. No matter how much you hate the Yankees , you can respect them. Think about how awesome America would be if we spent a lot more. Sure ticket prices would be a little more [aka taxes], but we would be the best baseball team (umm Country) in the world. We could Moneyball it by cutting the fat from players who cost a ton of money. We can sell all the expensive programs to richer countries than prosper and make the playoffs when our slightly cheaper shit makes it just as far. Owners know how to budget, do things for the greater public (i.e building new stadiums). Several stadiums also do "pups in the park" meaning they love animals rights. They have giant televisions which shows that are pro technology and know how to regulate the FCC. A baseball team plays 49.6% of the year, so they are use to high pressure situations for a long period of time. They also have adapted advance metrics faster than any other sport. 

All in All when I go to the ballot this November I am voting for baseball, and nothing else.

Friday, May 25, 2012

Moneyball(ed)


Moneyball(ed)
Recently, I read an article about how Human Resources Departments use numbers to screen candidates and, thus, reduced the time it takes to hire someone. If you are interested in the full article, you can read it here.  Since the movie Moneyball came out, there has been a push in the media to emphasize the importance of data analysis. Here are some important notes: the book Moneyball came out in 2003 (goes to show Americans really don’t like to read), and the following statistics are over 100 years old: Runs, RBI, Home Runs, Errors, Batting Average, and Earned Run Average]. I make this note because people were using statistics for 100 years to look at baseball players.  The media fails to mention this. We have statistics for almost everything, but it doesn’t mean they are good statistics. In the movie, Jonah Hill’s character makes the all but important comment : “We are finding an Island of Misfit toys”. His character emphasizes finding new statistics that are actually better predictors to the desired outcome (in his case wins). As much as data analysis is a science, it is actually an art as well.  Data Scientists (and statisticians) need to look outside of the issue and attempt to create new statistics that will predict the outcome better. It is not simply just finding numbers that fit, and calling it a day.
                This brings me to my next point, numbers never lie, but the people that tell them do. Here are some facts: I have been unemployed for 36 hours and our league-mate John had jumped up 10 points into 5th place. Now, hopefully by the end of this post I would have taught you how to not believe any of these.
                Yes, I have been unemployed for 36 hours, which is indeed a fact. However, before you begin to send your condolences I should tell you that in 72 hours I will be employed by a new company. I failed to apply context to the situation. I didn’t tell you knowledge that I knew, that outside of the range I provided I will actually be employed. In this problem, statisticians typically lower their sample size to include only the information that is convenient for them. Here is another example of this nature. The New York Rangers have never lost a game seven at home. Hopefully you asked how many times has that happened, [it’s 4 btw]. See, always think about “what’s outside the box”.
                John, my good friend [he also helped edit this post, so hopefully he didn’t delete this section], sent out an email that went something like this. “Guys I have gained 10 points overnight, and am now in 5th place, fear me, I’m the best [insert wrestling taunt]”.  Now John, like a good taunter, left out a few pieces of key information. He was in 5th place last week! He also had four pitchers pitching that night, so it wasn’t a surprise to see him go up 30ish strikeouts, and gain some wins. It is not a shocker that he went up 10 points. He also is in 5th, it’s not like he is winning the league. John failed to tell us what happened before, that he actually just regained his position!
                See in life, we are rarely given false numbers. They are just applied for the author’s point. Sure the unemployment rate may be declining; however, it may just be that people stopped applying for unemployment!  So the next time you open the newspaper   read news online think about how statistics are being applied, and whether there are statistics outside of that bound.

Friday, May 11, 2012

Its all in a bad day

Hi readers! We have some [somewhat] exciting news. We have been legally claimed, and working on getting our own domain name. As for the blog, continue to read!

When I was stressed in grad school, I made a promise to myself to read books once I finished. Well, on Tuesday I turned in my last paper (for now, long story) and then opened up a new book. Hopefully by reading more, these posts will become more coherent, but I make no promises. I am currently reading How Fantasy Sports Explains the World [link]. The book tells stories and relates them to fantasy sports. I really like the style of writing, and figured I would borrow that idea, since not everyone knows sports, but everyone likes a good story.

I've never been a huge fan of Yelp; there are several reasons for this. Yelp tends to overplay or underplay the actual service of the establishment. People use yelp to either love something or hate something, and it rarely has constituency. I feel as though yelp truly only has two stars, a 5 or a 1. Another important part of this story is about my obsession with coffee. My compulsive addictive behaviour close friend Dave got into espresso about a year ago, and since then we have become snobs about it. We travel to hell on earth Arlington to try new shops and we started grind our own beans. If we lived in the burbs, I'm pretty sure we would have tried growing our beans. The moral of the story is, I take my coffee very seriously. 

So one Friday, not too long ago, I travelled to Chinatown Coffee. It’s a young, hip place where all the barista's ride bikes and have tattoos (that's how you can tell if a coffee shop is legit btw). I ordered my drink (which shockingly was on the menu: a cortado for those wondering) and asked for whole milk. The barista's bar was full of drinks and was calling out drinks like we were at bingo. I hear "non-fat cortado", to avoid confusion I ask the barista if this was whole milk or non-fat, to avoid taking someone else's drink. His response "I dont know, its not poisonousness so just drink it". Now sure, it was 8am, and this guy clearly raged the night before, but that is no way to talk to a customer. After telling a few of my friends, some suggested that I write a review on yelp. I protested, because it was really only one barista, and it wouldn't be fair to the entire shop. So instead, I did nothing but boycott that place... until today.

As I mentioned, I like to read in the mornings in coffee shops. So I walk into the place and sure enough, Baldy McBaldster is there, with his fashionably tattooed arm and hipster glasses. Mentally all I can think of was, well im screwed. I order a Cappuccino and await the results. While standing there, a girl went up to the bar and asked "Is this skinny, because it says whole". Then I saw it coming, "Oh no here we go", but to my surprise, he was nice. "Oh I think someone already took that drink, let me make you another one". Wait, did that just happen, this guy was suppose to be a dick, and he was nice. Then my drink is up, looks me in the eye and says im going to spit in your drink "Large Cap right? Its coming right up". Sure enough, he gave me a larger size, fist pumped and we chatted about the shop.

In life, we all have bad experiences, bad days and really bad luck. However, as in sports, if your player goes down with an injury, or goes 0-5, gives up 14 runs, you can't give up on them. The Washington Capitals have faced elimination twice this season, and I refuse to give up on them. I had a bet with another friend that Heat would make the finals, and his bet was that Bulls would make the finals. Despite the unlikelihood of that happening since Derrick Rose went down, he still kept with them. He even tweeted me that they had (prior to them losing last night), 3% chance of making it. He stood by them despite the bad luck they had. Our fantasy players are going to have bad days, bad weeks and for some bad months. However, if you truly bought into your system don't give up on them. This applies with your team too, just because they had a rough start (cough the phillies), doesn't mean they can’t cash that bad luck out into something good. Sure, you aren't going to win a million dollars over this, but just like the good luck well dries up, the bad luck well does also. Had I given up on Chinatown Coffee, I wouldn't have had the great experience I did this morning.

Thursday, April 5, 2012

This Week in Sports

So, I've been quiet for some time, mainly because my writing skills are terrible i've been busy.[Note, 24 year old,  entrepreneur seeking editor at cost of hugs]. Over the weekend I typically watch over 5 hours of sports, and normally just complain to my friends, who somehow put up with it. So instead, I'm going to do some reviews on the series of tubes internet. I'll try to recap what happened, and as always throw in one non-sport related thing so that the five readers that actual read this thing get some diversity.

WrestleMania 28


Although I had not watched wrestling since I was 12 or so, a close group of my friends are really into wrestling. One of those said friends held a gathering to watch this event. Despite it being very fake scripted, the event is still a joy to watch. I was surprised how many commercials there were considering this this was pay-per-view, but I've heard its a recession so I guess I understand. If you are really interested in seeing who won, jump to here. What I found interesting was my obsession with the statistics side of this. How long does it actually take to count to three, and I'm talking long M-I-S-S-I-S-S-I-P-P-I . I'm pretty sure I got up for tasty beverage and got back in time before the actual count was done (maybe he forgot what came after two). Another thing is, how often do they just count to one, is it just on the first pin? Also, what's the point with submissions, they are the in-side-the-park Home Run of wrestling. Everyone wants to see them, but rarely does the runner round third (lazy athletes). Moral of the story, I'd watch again, but likely not do this thing weekly, I have priorities.

Spring Training:
Most people say Spring Training does not matter, and rightfully so they have a point. I was watching the Braves play the Phillies this weekend, and considering how often these two teams play, I thought it would be an interesting outcome. If there is one thing I learned from Spring Training, its really the massive difference between minor leagues and the majors. Most losses in spring training come from a minor leaguer giving up runs.

140 Characters to win
@Basketball - Lebron James changed the NBA[fact]. Since leaving the Cavs [likely to get Sullinger], are 36-98, they were 272-138 with him.

@Baseball - Opening Day is Today; and Miami Lost last night

@Football - The Saints are without a hope, the new uniforms are boring,

@Hockey - The capitals and buffalo are doing their best not to make the playoffs. Capitals could still win the Southeast.

@Soccer - When John Henry took over the team, he was concerned with the depth of the team. http://goo.gl/HoyOi , He saw it coming.

@SoccerUS - Red Bull looking strong, and what is wrong with DC United.

@Wrestling - I'm not sure if John Cena smelled what the Rock was cooking, but he surely does now.

@Notsports - When i voted this week, only two positions of the 8 had someone running. Gotta love DC voting.

Friday, March 30, 2012

The Lottery : Waste of Time


I've been known to hate  not like a lot of things in my days, but i truly hate the lottery. The old building I use to work in had a small mom and pop shop where a lot of people lined up for lottery tickets. When I first started I use to think I was in Soviet Russia waiting for bread, but then i realized it was worse. People were waiting in line in the hopes that this capitalist market would allow them to become a rich citizen. Don't get me wrong, I love a nice rags to riches (hence my love for the films Blank Check and Pretty Women). Lottery is Italian for "your screwed", ok that's not factually correct, but it is surprisingly close. Just because the pay-out is higher, does not mean your odds increase. I also want to be frank, the people that just started playing, what is the difference between 10 million (the lowest amount). Sure, it would be cool to have a plaque on your wall saying "I won the world's richest lottery". But in reality, we are not really going to be able to tell the difference between $10 million and $640 million, but now that it is so high, everyone has to play.

Its more than just the odds that make me hate the lottery, its really the entire thing. Picking your favourite cats foot number, throwing darts at the stock page, picking numbers because you had 13 single dollars in your pocket. All of these lead to people to believe that that they will the Jackpot.  I'll do my best to describe some neat things about ways to make money and how to lose it quickly. The first set of graphs are the odds of winning the lottery. If the background is red, the machine hit at least once, and  if there are multiple white spots, it means that how many times you hit. I also put a graph in to show you how many times i hit at each money level. I also "played" the lottery 1,755,625 times. (Its the maximum size allowed by PC). A quick sidebar on this issue. When I learn a computer program langauge, I try to solve everyday problems using the computer. When I learned R, my first program simulated the lottery. I had a where clause, where the computer would keep drawing until it hit the lottery; needless to say, my computer crashed and it was tough explaining it to IT why this happened.



Amount of Winnings
Total hits After
1.7 Million trys
$640 Mill
0
$250,000
1
$10,000
2
$150
2
$10
2147
$7
5806
$3
12416
$2
23575


As you can see by both the table and the graphs, not a lot of people are going to win this weekend.But I'm sure those people who bought tickets are like "But it only takes 1 ticket to win, and I'm going to win". I say that every time I approach an attractive celebrity, it only takes one hello for her to go home with. Needless to say, im 0/100 on Rachel McAdams (one day though one day though). A little bit ago I saw the film "Hes just not that into you" featuring a list of somewhat famous actors and actresses. The main thing I learned was that you need to act as the rule, and not the exception. If i went around thinking everything was going to go my way all the time, then i wouldn't be blogging! Lets just admit it folks, the lottery just not that into you, so don't expect it to call you back tomorrow morning. Nor can you call it, because you have the wrong numbers.

I'm sure a bunch of you are thinking , Listen PatchStats, its only (insert any value you spent on the lottery), I think I can live without this money. Below is barchart of playing the lottery. The first bar is what you can expect from winning the lottery. (Probability*Value) for all possible outcomes. As for keeping the money in your pocket, thats the second bar.

. Lets think about if we kept that dollar, what  could we do. You could tip your waiter/waitress more, you could eat a McDouble (which for some reason is cheaper than both a hamburger and cheese burger, despite having more of both), a Taco Hard Taco (for reason the soft onse are 1.19, damn you for knowing your market). You can even out on LivingSocial for $1. I did a quick search on craiglist to see what i could find for $1, apparently you can get Capitals Tickets  [wow have the capitals really stunk this year]. A new pair of trainers , and even a smart car and thirty minutes of driving credit. [ok that last one was free, but hey you get my point.] So basically, you are screwed, its like that scene in titanic, where rose doesn't hold on to jack's hand, except this time Rose threw Jack off the top of the boat with a lead weight tied around his leg. All gambling kinda sucks though, to reach the Mega-million dollar amount, it would take an estimate 25 million hands of BlackJack, assuming you played fairly and smart and won .495% of the time. That would take somewhere around 289 days to win, assuming you played 1 hand per second (which of course is not likely, but neither is winning the lottery). So the next time you debate getting that Guacamole that is that extra $1.00, just say "Yes, At least I didnt play the lottery".

Sunday, March 4, 2012

The £ 50 million Minor League Decision : What to do with Torress

Introduction:
For those of you who have followed the company blog for a while, you know I try to diversify the topics I write about. And I’ll do my best with this soccer post. I love soccer, and have begun to really focus my efforts onto this emerging sports. But don’t let me get ahead of myself.
I have a friend from Peru whom I discuss a lot of soccer about. Despite his beliefs about how statistics will ruin the sport, I still do my best to use stats to predict the outcome of games. This friend of mine, once had a discussion about why I believe professional soccer needs a legitimate minor league system. My biggest argument was Fernando “I hate to score” Torress , who has struggled to score in his last 24 starts. For a professional soccer player who cost 50 million British Sterling Pounds, they expect a little more results from him. Lets take a look at a graph.
Stats / Graphs:




This represents Torress’s Goals 3 game Average and his Shots 3 game Average[1]. Now if we look around his 78 mark we see a pretty deep drop off from his normal pace. This is because 80 appearances ago Torres got injured. At the time, Torress was with Liverpool, who played him back very quickly. The team was looking for anything they could get. The team was in downfall, trying to scrape points. But we learn from hindsight is that sometimes players need to “gain” their confidence. Think of this way, if you were going to train for a marathon, you wouldn’t start your first practice against Steve prefontaine. That is what Professional soccer has a problem with, starting players against the best competition, you need to ease these things back. Ever famous American Athletes go through rehab by playing against lower competition, even Derek Jeter did this.[2]  This with the fact that Derek Jeter cost the Yankees 220,159,364 (over some years, but still).
Ok , back to the graph. When you look at this, you see that Torress’s shots did not really change, but he isn’t thinking about shooting, he is just taking the shots. His “touch” is rushed, and he isn’t taking his time taking the shots, he is just hoping the ball goes in. This is typical of someone desperate for success, [I mean even the cast of the Jersey Shore shoots until they hit]. Lets break this down by the numbers.
Goals
Shots
Post
0.178082
2.328767
Pre
0.5
2.504348
 Above is a table showing you the Average amount of goals per game that Torress has scored per game, this is broken down from Pre-Injury and Post Injury, then the next column is Shots. See something different, there is a much larger gap between Goals than there is shots. Shots seem to be about the same, where goals he seems to have dropped. Lets get Nerdy, and drop the real statistics.
The Difference between Goals has a P-value of .001 , where the P-value of the difference in shots is .6 ; all in all this means there is no difference between the amount of shots per game he is taking, rather his conversion of those shots to goals has declined.
Conclusion:
Why is this so concerning? It’s because a £ 50 million should not be sitting your bench, and should be called to his national country. I spoke two several soccer statisticians while at the MIT Sloan Sports Conference, and most of them agreed that we need a time to shoot metric. I assume, this statistic would look somewhat normally distributed. People that rush the shot will generally miss, and those that over think the shot will generally miss as well, its finding that golden middle that will allow yourself to score.
Back to main question and point, if sports are such “feeling it” art, than we need to make sure a player gets back on pace even if that is lowering his competitive standards. It is very likely that Chelsea is going to lose money on Torress, but they can stop the bleeding. He has talent, he has fitness, but he lacks the confidence to put the goals away, and until he regains that it will be a dry spell for the Spaniard. Hopefully the new Chelsea Manager can fix things.


[1] In soccer, we get the complaint about all the noise. To avoid this, we look take a Moving average, its quite common in time-series.

Wednesday, January 18, 2012

Beyond a Boxscore



One of my favourite blogs is an American Baseball blog that attempts to explain the information that can be found beyond a box score.  To all those non-baseball fans, a box score generally lists what a player did throughout the game. Typically includes a different box for pitchers and batters.[1]  I’ll try to avoid boring you with history, but it adds an important factor to this conversation. The boxscore was invented to let people know how their teams were doing while they were on the road. So when the Boston Braves played the Philadelphia Athelthics , a reporter would send home the information to the Boston Globe, letting them the information.  So this history loops back in, because for a while, different reporters would [naturally] report different statistics. They reported their batting averages, how many hits a batter got, if he walked or so on.  What this did was, well create a market for statistics. As the process continued, newspapers began to report the same statistics; a unified thing known as a boxscore. If a player was substituted, he would be added under the position that he took over for.
Boxscores changed the way we looked at the game; it gave those of us not at the game, a way to look at how well the player was doing. It allowed us to quantify the portion the player was contributing to a win against the opponent.  So the blog, as mentioned above, tried to think beyond the basic statistics. Bill James [for all you moneyballers] did the same thing. He decided to take a look at what these statistics meant, and how well they measured the players contribution to a win. Well long behold, he (as did others) found out that those statistics did an average job, but not the best job at describing the outcome of the game.
Here is the problem with soccer (football for those abroad). Soccer doesn’t have a boxscore, and people only care about few thigns. They want to know how well their team is doing in the league [as presented by a table], who has the most amount of goals, and how many clean sheets a team/keeper  has.  ESPN [by now you know my thoughts on them] decided to create their own version of a boxscore, but with a soccer twist. They gave the following statistics, saves,goals, shots (on goal), Time of Possession, yellow cards, red cards, free kicks, offsides and  fouls. Here is the kicker, not one of those statistics is a useful predictor of the outcome. That is, if you model points (3 for a win 1 for a draw and zero for a loss), you find that not one of them predicts the points a team receives. I’ve started to take note of this, as I would watch New Castle United play its games. New Castle United, currently sixth in the table, has been outshot all but three of its games. [2] What we need is something that will reflect a basic understanding (leave the advanced stuff for nerds like us). How about a player by player boxscore, something that describes turnovers forced, shots allowed, break aways, defenders drawn, TOP (per player), minutes played (per player) tackles won, tackles lost and headers.  Which team has allowed the most amount of headers this season?  All these add to important was to play the game. Liverpool (bias be aware), could use this information for when to play Andrew Carrol over Luis Suarez. If a team allows a lot of breakways, move your faster players up front. If the team plays a conservative D, bring in Carroll and work on set pieces. You would be amazed that these statistics are out there, but few people are recording them.



[1] Baseball’s sister (or brother we are gender neutral here) sport cricket, also has something similar.

[2] The three games they weren’t outshot, they lost two of them

Thursday, January 5, 2012

Weather| It is an odd Thing.

We look outside, ask siri or just guess what the weather will be each day. Rarely do we look at a year to year basis, wondering what the temperature was. I did an interesting analysis for Washington DC over the past year, and a unique look at how look at data.

Year to Year and Averages


As you can see from the graph above, the past two years have been a little above average. Although we have not had a record breaking this year, although we did have one last year.

Last year it rained 37 out of the 127 days, meaning we had rain 29% of the time (days speaking) last year.
This year it rained 51 out of the 127 days, meaning we had rain 40% of the time (days speaking) this year.
Rain is rain or snow.

Is Weather Data Normal?
Weather needs to be normal for a lot of the analysis we use. If we want to assume that a day was record hi, or that global warming is increasing the temperature, then we need the data to be normal.

As you can see, the weather data for Jan 5 are normal. With a [nerd alert] Shapiro Wilkins p-value of .19, we  can say the data are normally distributed.

What this allows us to do, is to see if we have had any statistically significant days since 1936. We actually have!

High Points
1950 62 Degrees
1997 60 Degrees
2009 58 Degrees

Low Points
1968 18 Degrees
1958 18 Degrees

Is the Earth Getting Warmer?
Always a debatable topic, and I'm not going to side really on it. I'll just throw some data at you.

2011 out of 127 days 110 of them were above average on temperature
2010 out of 127 days 100 of them were above average on temperature


However, as we can see above, the two different years do not look that different. The p-value is a sad .486. So we can conclude that there is not a statistical difference from the temperatures before.


Now if we subset pre 1997 and post 1997, we get something a little more likely to differ. Here is the thing though, that P-value is not statistically significant either, its p-value is .188 . So we conclude that the temperatures in Washington DC have remained Normally around their average since 1936, and that extreme temperatures were not common.

Here is what Jan 5th has looked like since 1936