Saturday, April 17, 2010

Processing and... processing.

Pick o' the Post: The Odyssey by Symphony X on The Odyssey

This is about as epic as a song can get.  Yes, more epic than Rush - gimme a break.. this is Symphony X.  This song is a 24min. musical rendition of the story of Odysseus' adventures in the story The Odyssey.  I don't really know the story that well, to be honest, but I did get the privilege to experience this (entire) song performed live in Atlanta.  That is something I will never forget.  I was there to hear them perform their latest album Paradise Lost, which is a rendition of the classic epic novel.  The entire album is an amazing feat, but the title track is pretty cool for metal/non-metal audiences alike.

So, I got my hands dirty with Processing last night, and I'm glad I did.  I was following along with a post from a blog I follow, and I'm really excited to get some more experience here.  Processing feels like a bridge from working strictly within R or Stata or Open Office for plotting solutions.  Processing feels much more open to the imagination.  The difference is that there's quite a bit more programming involved developing graphics in Processing than there is in in R or Stata, and certainly Open Office.

I haven't figured out how to export images yet - actually, I ran across it once, but I'm too lazy to figure that out right now.  So, these are screenshots of the graphics I created with the tutorial.  First, I'll briefly explain the data and the analysis that's going on.  Jer sent a request out on twitter to have any interested followers tweet a random number (from their head):

He put the 225 human-generated random numbers into this (publicly available) google spreadsheet.  The tutorial works with data stored, more generally, in a remote location, like a google spreadsheet.  He cites not having to change filepath names when data moves on your own system as a good reason to try to keep data in centralized location ("centralized" is relative).  I can relate to that sentiment... with much (if not all) of our data stored on our servers in the office, I know exactly where the data I want at any particular moment is, and don't have to fumble around trying to find the data I want to load into memory for analysis.

Anyway, from here Jer leads readers through a number of methods to analyze the data.  Below is row after row of machine-generated random numbers, and one of those rows is the human-generated data.  Each column is a number 1-99, each machine-generated row represents 255 random numbers.  The brightness of the ellipse indicates how many times that number is present in that set of 255 numbers.

So, this a crude, first round of analysis.  Obviously, it's damn near impossible to visually pick out the human-generated row.  It's the 37th row from the bottom of the image (36th row from the top).  I adjusted some parameters to highlight the our dataset of interest - it's approximately twice as bright as the other rows in this image.  I also summarized some of the next plots into one graphic below this one, observing the increased visual definition that we can get from a bar graph, also adding color gradients to emphasize various bits of data.

Before I go any further with the tutorial, the idea of perspective was emphasized by Jer.  This is why we first observed the bright/dull points, then the bar graphs, which we then applied color to.  Comparing this bottom-most graph to 6 rows of machine-generated numbers, our human-generated data starts to look a little outlier-ish.  Our data is the top row of the next image.

It occurred to me that an extra step in manipulating this graphic might make this "outlier-ish" observation more clear.  Ordering the bars based on their height (color), we should be able to get a better idea of what the difference is here.  Without this additional adjustment, maybe it's that the observer is left to "calculate," in some sense, their own order to compare each row.

Now, the dissimilarity is a bit clearer.  There are at least a few numbers that our human subjects tend to pick, seemingly, a bit more often than random.  Jer continues on to display two more visual representations of this same data to try to find some pattern.  First, using a grid with color gradients.  The first row is 1-10; second, 11-20; and son on.

Then displaying the same grid, but displaying the numbers (colored with a gradient) instead of the squares.

Aside from the Douglas Adams effect (#42), as Jer points out, these random numbers generated by 225 of his followers seem to have chosen numbers ending in 7 quite a bit more than we might expect.  He conjectures if there is something about the number 7 that seems "more random" to us (or, less generically, his 225 participants).  Interesting though.

I'm glad I made it through this (my first) Processing tutorial.  I'm looking forward to applying these tools and concepts to the unique data that Booklamp affords my imagination.

Have a wonderfully data-filled day.

No comments:

Post a Comment