Sunday, March 17, 2013

Can you picture this?

A few weeks back I attended a nerd mecca MIT's Sloan's Sports Conference. While the exact details of the conference are pretty irrelevant, there were a few talks that really sparked my interest. One was about social media, while another was about data visualization. Most people love statistics, but few really know how the difference between a Poisson and Exponential distribution. It is often the goal of the statistician to tell a story about the data using fancy math and pretty pictures. If I showed you a picture of a bar packed full of people and another bar with no one in it, you would probably be able to deduce which bar was getting more sales. As Data Scientists, Statisticians and Analysts our job is to find the data then explain it. I was also interested in data visualization because I felt it could help me in my non-sports day job [learn more about it here].  As always, if you like what you see, send us a comment at patch.stats[at] gmail [dot] com, would be happy to help out.

This video is awesome and will explain a few things. 

It is more than just pictures.

We have a saying in the small group I work in : "If it is in excel I dont trust it". Sure we are a little pretentious about our graphs, but it says a lot about the user behind the data. Think of it this way, using excel is like spraying axe all over your body, it may put a small cover of scent over the teenager but it does not make him any smoother or good looking to his date.

The first example here is plotted using statistical software. Don't worry, you are not suppose to understand what is going on, that is the point. A good eye can probably only handle five colours, and maybe five shapes, which is more than I can provide in the graph below. It looks like a four year old got a handle of the crayon box and just started scribbling. Simple plots are just for that, to plot a simple amount of data. There are indeed several other ways I could have presented this, could have created 12 different plots with maybe 4 teams on each plot. The point is, in an era of Large Information [my buzzword killer for big data], we want to see ALL the data, not some of it.

So little red riding hood had a p-value of .01

Unless you took a statistics class, or business class trying to sell itself as a statistics class, you probably have no idea what a p-value is, nor if it being .01 is good or not. I'll give you a hint, it normally means something is broken, but sometimes we want things to be broken :).  With numbers and hipsters came the infographic, and boy are they great. The infographic tells the story, with great typeface and clever spacing. A infographic tells you to not trust wolfs in the woods, and that if you are an egg, you shouldn't be sitting on a wall. What a infographic does not tell you is why Bears were eating porridge  but the blonde girl wasn't. Below is an example of an infographic, you should pick up really quickly that Manchester United is a very good thing [I left out that they were a football team].

I see the data but I do not feel the data

I often find that when I give a presentation there is always a question about 'what if'. So if we scored two more goals, what would our probability of making the playoffs be. While I do have webbapps for that (thanks to some fancy coding and awesome software), I'll leave a more user friendly version online for everyone. People want the ability to drag,drop and dictate. As a user, one can select the graph they want and click print to make it a pdf or simply an image to put in his/her corporate powerpoint. People love choice, its the reason why so many iphone cases exist, they all serve the same utility purpose but each one represents the freedom we have to present ourselves. What is also nice about these webapps is that it can give a default picture. While I've worked with bright people who love to play with the data, I've also worked with my fair share of 'hit me with the fact' type of people who just want to know as soon as possible what the data is telling them.

I wrote two scripts that you see the power of choice. I highly suggest that you view the test version of this on my webpage . This graph allows you to choose which teams you want to view, and will automatically update by pulling data down from a data source such a Hadoop,Hive or other databases.

Another web app verison is also pasted below

For full readable use, view the full version here

No comments: