Monday, October 28, 2013

Ignoring the evidence

I've been around data for a few years, we've seen each other at the bar every once in a while and a few times data took me home. Thats the kind of relationship I have with data; it takes me home and kicks me out in the morning. I think we all kind of have this problem with data; its there when it wants to be there and not there when you need it most. I've run into this problem, and I've kind of thought it as ignoring the evidence; data wont always be there to wish us a good night sleep, or to tell us when the next train is coming, or to help us get promote; data is either large or absent, very few times is it in between.

We often get so jaded by data that when it finally shows its pretty/handsome face well take what ever it is selling. Hell, data could sell us Keystone light and we would probably drink it. Then data leaves the next morning and we feel the need to not trust it anymore. The thing is, data is data, it does what it wants. Its up to us (as humans [or robots who read the blog]) to gauge what to make of data. The thing is, we can't marry data, we can only hope that it will give the information we need to marry an actual idea. Data is bipartisan, its gender neutral its all the things we want from data and nothing more. Data wont tell you whether to buy coke or pepsi, it will tell you which is cheaper, which in a survey of 100 people did better or which one sells better. Data can't create the art that is the human mind, it can only direct us in the proper direction. Think about creating a song purely on data; it would suck, like truly suck. I'm not saying take bits and pieces of actual music that was voted awesome and mash it together, I'm talking for every 100 men in a troop brigade play the G note, for every 100 women play the E cord and so on. The music would not sound good.

The thing to remember about data is that its not the decision maker, its the guidance to do so. Data tells us what we should do, not what we are going to do; "they  are probabilities not actuals". I emphasize this because you can't ignore data, its just the wrong thing to do. You should at the very least, take the data with a grain of salt and know what to expect what is coming next. Turning a blind eye to data is simply just a bad idea. Imagine if you knew you were allergic to peanuts, and just ate a handful of peanuts. In this case, the Data is your allergy to peanuts , and the decision was to actually eat peanuts. I'm sure most people were like "Oh that is pretty dumb"; but the truth is we still do it anyway. We ignore the obvious and pursue our blind agendas hoping the best will work. If you want to see what life without data would look like, blindfold yourself, plug your ears and see if you can get out of your house/apartment; that is what it is like to ignore the evidence.

Sunday, October 27, 2013

P of x given y

Hello the 12 of you that actually read the blog. In the following weeks I'll be moving over to a sports scoring and stats company for amateur sports. For those that have read the blog before, you realize that my passion lies very much in the direction of sports. I think its a great opportunity and I'm excited to be 'The Data guy / Data scientist'. For more about the company go here

What this means for the blog. Well of course there is the obvious stuff; I wont be posting my findings here (but I may post them on their blog). I also will probably tune down the sports talk, which is probably for the better. I'm going to take an approach that is stats in everyday life. I've always been big on saying P of X given Y (detailed below); so thats is going to be my new focus.

P of X given Y. (or A given B)

We basically run into this all the time, its just that generally Y is pretty complicated. Let me give you an example. I'm at a coffee shop and I have $5 in my pocket and they have 5 different coffee selections. The Y here is two steps, 1 is that I have $5 and the other being the selections. So given those conditions, which coffee am I going to purchase (event X). This is of course is Bayes theorem, the conditional probability, and one that we make ALL of our life decisions on.

Given my current condition Y, I've decided to pack my bags and move to a brand new city with brand new challenges. Now that Y is pretty complicated and more than you would like to read; but I choose X; because given the conditions of Y, this gave me (or so I hope) the highest reward.  We choose X everyday, and we dont know the optimal outcome, but we update our prior's and try again. Its from small things like trying a new type of coffee to a new root to work.

People complain that statistics are difficult and confusing; but what they don't realize is that we do these calculations daily. If I go to this dinner, what will my outcome be? Well your conditions are the date, the restaurant and a few others you probably haven't thought of. Sometimes we'll uses X's to change our conditions (maybe buy some roses or choose a fancier place). The thing is, we are constantly trying to optimize what we do, from our hairstyles to our apartments, are attempting to maximize our rewards. Whether they be subtle (building better friendships) or obvious (getting a raise at work), we are trying to optimize our lives. The core of that optimization os P(X|Y).

The Lesson:

We don't know, nor will we ever know, what would have provided the best result. The only thing we can do is look at our conditions Y, and make the best expected value of Y.

"Until the day when God shall deign to reveal the future to man, all human wisdom is summed up in these two words,--'Wait and hope'." ~ Dumas

We can only do what is statistically smartest thing to do. Other times we want to fight those odds, to prove that we can overcome them. Everyone wants to be the one that 'beat the odds' ; its why the lottery is so popular.