Sunday, March 17, 2013

Can you picture this?

A few weeks back I attended a nerd mecca MIT's Sloan's Sports Conference. While the exact details of the conference are pretty irrelevant, there were a few talks that really sparked my interest. One was about social media, while another was about data visualization. Most people love statistics, but few really know how the difference between a Poisson and Exponential distribution. It is often the goal of the statistician to tell a story about the data using fancy math and pretty pictures. If I showed you a picture of a bar packed full of people and another bar with no one in it, you would probably be able to deduce which bar was getting more sales. As Data Scientists, Statisticians and Analysts our job is to find the data then explain it. I was also interested in data visualization because I felt it could help me in my non-sports day job [learn more about it here].  As always, if you like what you see, send us a comment at patch.stats[at] gmail [dot] com, would be happy to help out.

This video is awesome and will explain a few things. 

It is more than just pictures.

We have a saying in the small group I work in : "If it is in excel I dont trust it". Sure we are a little pretentious about our graphs, but it says a lot about the user behind the data. Think of it this way, using excel is like spraying axe all over your body, it may put a small cover of scent over the teenager but it does not make him any smoother or good looking to his date.

The first example here is plotted using statistical software. Don't worry, you are not suppose to understand what is going on, that is the point. A good eye can probably only handle five colours, and maybe five shapes, which is more than I can provide in the graph below. It looks like a four year old got a handle of the crayon box and just started scribbling. Simple plots are just for that, to plot a simple amount of data. There are indeed several other ways I could have presented this, could have created 12 different plots with maybe 4 teams on each plot. The point is, in an era of Large Information [my buzzword killer for big data], we want to see ALL the data, not some of it.

So little red riding hood had a p-value of .01

Unless you took a statistics class, or business class trying to sell itself as a statistics class, you probably have no idea what a p-value is, nor if it being .01 is good or not. I'll give you a hint, it normally means something is broken, but sometimes we want things to be broken :).  With numbers and hipsters came the infographic, and boy are they great. The infographic tells the story, with great typeface and clever spacing. A infographic tells you to not trust wolfs in the woods, and that if you are an egg, you shouldn't be sitting on a wall. What a infographic does not tell you is why Bears were eating porridge  but the blonde girl wasn't. Below is an example of an infographic, you should pick up really quickly that Manchester United is a very good thing [I left out that they were a football team].

I see the data but I do not feel the data

I often find that when I give a presentation there is always a question about 'what if'. So if we scored two more goals, what would our probability of making the playoffs be. While I do have webbapps for that (thanks to some fancy coding and awesome software), I'll leave a more user friendly version online for everyone. People want the ability to drag,drop and dictate. As a user, one can select the graph they want and click print to make it a pdf or simply an image to put in his/her corporate powerpoint. People love choice, its the reason why so many iphone cases exist, they all serve the same utility purpose but each one represents the freedom we have to present ourselves. What is also nice about these webapps is that it can give a default picture. While I've worked with bright people who love to play with the data, I've also worked with my fair share of 'hit me with the fact' type of people who just want to know as soon as possible what the data is telling them.

I wrote two scripts that you see the power of choice. I highly suggest that you view the test version of this on my webpage . This graph allows you to choose which teams you want to view, and will automatically update by pulling data down from a data source such a Hadoop,Hive or other databases.

Another web app verison is also pasted below

For full readable use, view the full version here

Friday, March 1, 2013

Nom Nom Nom Oreo

So after siting in the Sloan's Sports Conference Talk about ticket analytics, I began to realize my obsession with marktes and learning when to buy and sell the most useless objects. There has been a lot of debate of recent whether dynamic pricing is great for the consumer or burns them in the long run. I've had a lot of consumer side of the story from Dynamic pricing. My friends always have the rule 'Last one to show up brings Fritos or Double Stuf's'. As someone who is often late, I've come to learn the price of Oero's Double Stuf's.

The Amazon Prime
I come from the naive belief that with my Amazon prime, everything must be the cheapest on Amazon. This is not a bad post on Amazon, because I love them dearly  but not everything is cheapest on the Internet. I bought an Aeropress the other day, and with my regular priced coupon at Bed Bath and Beyond, I actually saved some money. However, this did not stop me from attempting ti subscribed and save to Oreos. I was trolling around their supply when I came across my first real understanding of Oreo pricing. I bought a 4 pack for 9.50 [with free shipping]. At the time I was very knew to the Double Stuff market, but for some reason this seemed good. Sure enough 3 days later, my box arrived with my newly delivered treats.

The Run Out
Like all good things, those Oreos were consumed in my apartment, and we needed to restock. I went back to Amazon, went to click 'reorder' and noticed that the prices were much much higher. Being the Data Scientist I am, I closed the browser, cleared my cookies and even opened a new window incognito. All this effort was a waste, because suddenly the prices were around 18 dollars for the 4 pack, or around 4.5 a pack. Again, I was quite naive to the market, so I decided to investigate.

The Grocery Store / Mega Store Competition
I thankfully, live near both a Super Target and a Grocery store. They both have there perks, and both surely have their downsides. However, as I would begin to pick up these treats ; I began to notice a few odd things that were occurring  At Giant (the grocery store), the price really depended on what was on sale. The sale price was a really good $2.50; but its average price was much higher around 4 dollars even.  Target on the other hand, always seemed to have the same price, set at $3. I've been conducting my ermmm 'Research' for quite a while now, and I've never seen a sale on the Double Stuf's at Target.

So it Begs the Question : Where Do I go.
Lets use so elementary Math, and a little bit of situational circumstances and play my favorite game; where do I buy my double stuf's. Let's assume you dont read those garbage flyers that come in the snail mail, and you dont know if a sale is going on right now. However, you noticed that Oreos are generally on sale once a month, but that week is completely random.

Oreo at Giant |
P(Sale)*(Oreo Price on sale) = 2.5*(1/4) = $.625
P(No Sale)*(Oreo Regular Price) = 4*(3/4) = $3
Total Expected Cost at Giant = $3.625

Oreo at Target = $3.

The key to understanding this, is that if we dont' know when a sale is going to occur, we are the mercy of the inflated price. I'm not saying that we should go out and build the Kayak for Oreo demand, but even the Oreo market has variable pricing.